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Abstract 

The  temporal  logic  model  dieddng  algorithm  of  Clarke,  Emerson,  and.  Sistia  [17]  is  mod¬ 
ified  to  represent  state  grs^hs  using  hmarg  decision  dia§rama  (BDDs)  [7]  and  partiiioned 
tnauiiion  retaHana  [10,  11).  Because  this  representation  captures  some  of  the  regularity 
in  the  state  space  of  drcuits  with  data  path  logic,  we  are  able  to  verify  circuits  with  an 
extremely  large  number  of  states.  We  demmistrate  this  new  tedknique  on  a  synchronous 
pipelined  design  with  iqtprozimately  5  x  10**  states.  Our  moddl  checking  algorithm  handles 
full  CTL  with  fairness  constraints.  Consequently,  we  are  able  to  express  a  number  of  impor¬ 
tant  fireness  and  &imeu  properties,  which  would  otherwise  not  be  expressible  in  CTL.  We 
giTe  empirical  results  on  the  performance  of  the  algorithm  applied  to  both  sjmchronous  and 
asynchronous  drcuits  with  data  path  lo^c. 
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1  Introduction 


Bugs  firand  Ute  ia  the  deiign  phmse  of  a  distal  dxcnit  axe  a  maj<»  cause  of  unexpected 
delays  in  reaHaing  the  drcuit  in  haidwaxe.  As  a  zesult,  interest  in  formal  verification  tech¬ 
niques  for  hardware  designs  has  been  growing.  A  number  of  different  methods  have  been 
proposed,  but  nearly  aU  can  be  classified  in  terms  of  the  natural  division  between  the  data 
patha  and  the  eotUroWng  etreaHry  ia  distal  drcuits.  The  most  successful  methods  to  date 
for  verifying  data  path  logic  treat  only  functional  behavior,  without  considering  sequential 
behavior.  These  methods  are  frequently  based  on  the  use  of  automatic  theorem  provers 
or  proof  checkers  and  may  require  considerable  assistance  from  the  user  in  constructing  a 
correctness  proof.  In  contrast,  the  most  effective  techniques  for  reasoning  about  sequential 
behavior  usually  require  a  complete  exploration  of  the  state  space  of  the  circuit  [6,  21,  25]. 
The  state  exploration  techniques  are  attractive  because  they  are  highly  automatic:  the  user 
simply  provides  a  description  of  the  circuit  implementation  and  its  specification;  the  system 
does  the  rest.  In  the  case  of  a  single  controller,  the  approach  is  often  quite  practical,  since 
the  number  of  states  tends  not  to  be  excessive.  The  approach  has  not  been  very  useful  with 
data  paths,  however,  since  the  number  of  states  is  almost  always  too  large  to  permit  explicit 
enumeration.  In  order  to  reason  about  the  complex  interaction  between  controllers  and  data 
paths  we  need  techniques  that  are  able  to  handle  both  types  of  circuits.  Developing  such 
techniques  has  proven  to  be  a  very  difficult  problem.  However,  the  regularity  of  data  path 
designs  provides  some  reason  to  believe  that  their  state  graphs,  while  large,  will  often  have  a 
relatively  simple  structure.  Consequently,  it  may  be  possible  to  find  a  concise  representation 
that  exploits  the  uniformity  of  the  state  space  and  depends  in  sise  more  on  the  inherent 
complexity  of  the  data  path  logic  than  simply  the  number  of  states  it  determines. 

In  this  paper,  we  show  how  temporai  logic  model  cheeking  [12, 13, 14, 16, 17]  and  reoch- 
abilUy  analyaia  algorithms  can  modified  to  represent  state  graphs  using  kinary  deciaion 
diagrama  (BDDs)  [7].  Because  this  representation  captures  some  of  the  regularity  in  the 
state  space  determined  by  sequential  circuits,  we  are  able  to  verify  sequential  circuits  with 
an  extremely  large  number  of  states.  The  algorithms  are  based  on  computing  fixed  points 
of  functions,  called  predicate  transformers,  that  map  sets  of  states  to  sets  of  states.  The 
predicate  transformers  are  used  to  describe  properties  of  circuits  and  are  derived  from  the 
transition  relations  of  the  circuits.  Both  state  sets  and  predicate  transformers  are  repre¬ 
sented  with  BDDs.  Thus,  we  are  able  to  avoid  explicitly  constructing  the  state  graph  of 
the  circuit.  We  have  tested  the  performance  of  the  algorithms  on  both  synchronous  and 
asynchronous  circuits  with  data  path  lope.  We  were  able  to  verify  a  pipelined  ALU  with 
over  10^”  states  and  an  asynchronous  stack  with  over  10*®  states.  More  importantly,  for  the 
classes  of  circuits  that  we  verified,  the  CPU  time  required  increased  as  a  small  polynomial 
in  the  number  of  components  of  the  circuit.  These  results  provide  strong  evidence  of  the 
scalability  of  our  methods. 

1.1  Contributions 

The  major  contributions  of  this  paper  are  as  follows. 

1.  A  BDD-based  algorithm  for  CTL  model  checking  with  faimeaa  conairainta. 
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2.  A  dctcriptioii  of  dujwnetne  parHiioned  tnuuUion  rdaHona  and  eonjuneUve  partiUtmed 
tranrition  rtlaHotu.  these  methods,  computing  the  transition  rdation  of  a  circuit 
neuer  limits  the  sine  of  the  circuits  that  can  be  Tetified. 

3.  A  modified  hmadth  fint  $earch  algorithm  to  speed  up  reachability  analysis  for  drcuits 
represented  with  disjunctive  partitioned  transition  relations. 

4.  A  thorough  em^rical  study  of  the  asymptotic  complexity  of  our  methods  using  several 
substantial  examples. 

5.  General  techniques  for  improving  the  efficiency  of  verification  methods  based  on  reach¬ 
ability  analysis  by  viewing  such  verification  as  automatically  constructing  and  checking 
an  invariant. 

Several  of  the  above  contributions  are  full  length  descriptions  of  results  the  current  authors 
first  described  in  the  conference  literature  [10, 11, 12, 13,  14]. 

1.2  Related  Work 

There  are  a  number  of  approaches  for  verifying  sequential  drcuits  by  state  exploration  tech¬ 
niques.  Not  l<mg  after  Bryant  described  HDDs  [7],  several  groups  began  adapting  state 
exploration  algorithms  for  use  with  BDDs. 

Condert,  Berthet,  and  Madre  developed  a  method  for  showing  equivalence  between  two 
deterministic  finite  automata  [18].  Given  two  automata,  they  perform  a  breadth  first  search 
of  the  state  space  of  the  product  automata.  BDDs  are  used  to  represent  sets  of  states  and 
the  possible  transitions  of  the  automata.  For  the  latter,  a  traneiUon  function  vector  is  used. 
This  is  a  vector  of  BDDs,  one  for  eadi  state  bit,  that  represents  the  next  state  logic  of  the 
circuit.  Cho  et  oL  [15]  discuss  a  similar  technique. 

Several  groups  have  independently  applied  BDDs  to  CTL  model  checking  [16, 17].  Burch, 
Clarke,  McMillan  and  Dill  [12]  have  developed  a  symbolic  CTL  model  checker  that  uses 
transition  relations  to  represent  the  circuit  being  verified.  Coudert  et  oL  [20]  and  Bose  and 
Fisher  [3]  have  described  BDD-based  algorithms  for  CTL  model  checking  that  use  transition 
fonction  vectors  for  this  purpose.  Since  all  three  of  these  verification  techniques  are  based 
on  CTL,  they  are  able  to  handle  specifications  that  include  unbounded  liveness  properties. 
Such  specifications  cannot  be  handled  by  other  symbolic  techniques  for  sequential  circuit 
verification  such  as  those  described  by  Bryant  and  Seger  [9],  Bose  and  Fisher  [2],  and  Coudert 
et  al.  [18].  In  addition,  the  algorithm  of  Burch  et  oL  permits  arbitrary  CTL  formulas  to  be 
used  as  faimeea  constraints  [17]. 

A  serious  limitation  of  the  approaches  that  use  transition  function  vectors,  as  opposed  to 
transition  relations,  is  that  they  cannot  model  nondeterministic  systems  in  a  natural  way. 
When  modeling  systenu  for  verification,  there  are  two  major  sources  of  nondeterminism. 
First,  nondeterminism  can  occur  because  of  concurrency  in  the  underlying  circuit  (as  is  the 
case  in  most  asynchronous  circuit  models).  Second,  nondeterminism  arises  when  abstrac¬ 
tion  is  used  to  simplify  reasoning  about  some  part  of  the  circuit.  Becatise  abstraction  may 
hide  part  of  the  state  of  circnit,  a  transition  may  appear  nondeterministic  even  though  it 
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waa  oxipnaJly  detenniniitic.  Af  an  example  of  theie  two  sitaations,  connder  the  cache  co¬ 
herency  protood  for  the  Encore  Gigamax  that  McMillan  hai  investigated  [29,  30,  31].  The 
protocol  was  designed  for  a  shared  memory  multiprocessor  organised  as  a  series  of  buses 
connected  by  an  asynchronous  hierarchical  routing  network.  The  caches  on  each  bus  are 
kept  consistent  using  bus  snooping,  while  a  complex  message  passing  protocol  is  used  to 
ensure  consistency  between  caches  on  different  buses.  McMillan  modeled  the  system  as  an 
asynchronous  composition  of  synchronous  finite  state  machines.  He  also  used  abstraction 
to  simplify  the  verification,  which  made  the  model  even  more  nondeterministic.  For  exam¬ 
ple,  McMillan  did  not  precisely  model  the  cache  replacement  mechanism.  When  modeling 
asynchronous  systems  or  using  abstraction,  it  is  often  necessary  to  use  fairness  constraints 
to  make  accurate  models.  For  example,  fairness  constraints  are  required  to  describe  a  gate 
with  an  arbitrary  but  finite  delay. 

If  a  single  HDD  is  used  to  represent  a  transition  relation,  the  size  of  the  HDD  can  be¬ 
come  a  bottleneck.  This  problem  can  be  solved  for  asynchronous  circuits  by  representing 
the  transition  relation  as  an  implicit  disjunction  of  HDDs  [12],  a  technique  we  now  call 
disjunctive  parUtioned  transition  relations.  Adapting  this  technique  to  synchronous  circuits 
requires  conjunctive  partitioned  transition  relations.  Touati  et  ai  [34]  and  Burch,  Clarke 
and  Long  [10,  11]  developed  methods  for  computing  an  image  of  a  conjunctive  partitioned 
transition  relation  (the  latter  method  is  described  in  section  5).  The  eflidency  of  both  tech¬ 
niques  derives  from  early  quantification  of  state  variables.  We  believe  that  our  technique 
often  allows  more  early  quantification,  and  so  is  more  efficient.  The  available  empirical 
results  support  this  conclusion,  although  more  experimentation  is  necessary  before  a  defini¬ 
tive  conclusion  can  be  reached.  A  detailed  comparison  of  the  two  methods  is  presented  in 
section  9. 

Bryant,  Seger  and  Beatty  [8, 9]  have  developed  an  algorithm  based  on  symbolic  simulation 
for  model  checking  in  a  restricted  linear  time  logic.  A  specification  consists  of  preconditions 
and  postconditions  expressed  in  the  logic.  The  preconditions  are  used  to  restrict  inputs  and 
initial  states  of  the  circuit;  the  postcondition  pves  the  property  that  the  user  wishes  to 
check.  Formulas  in  the  logic  have  the  form 

JM  A  Xpi  A  X*P8  A  • . .  A  X^-'pn-i  A  X”p». 

Note  that  the  syntax  of  the  formulas  is  highly  restricted  compared  to  most  other  temporal 
lopes  used  for  specifying  programs  and  dreuits.  In  particular,  the  only  logical  operator  that 
is  allowed  is  conjunction,  and  the  only  temporal  operator  is  next  time  (X).  However,  the 
logic  is  still  applicable  to  many  of  the  hardware  systems  that  appear  in  practice.  Bose  and 
Fisher  [2]  use  similar  techniques  to  verify  pipeline  circuits  with  respect  to  a  simpler  abstract 
model  by  means  of  a  representation  function,  in  analogy  to  abstract  data  type  verification. 
By  limiting  the  class  of  formulas  that  they  handle,  these  techniques  can  check  certain  proper¬ 
ties  very  efficiently.  However,  these  restrictions  are  also  a  disadvantage  compared  to  general 
model  checking  algorithms.  The  number  of  time  units  that  a  formula  can  “look  ahead  in 
the  future”  is  bounded  by  the  maximum  nesting  of  X  operators.  There  is  no  analog  of  the 
until  operator  that  can  look  arbitrarily  far  into  the  future.  Consequently,  the  logic  is  not 
really  suitable  for  reasoning  about  nondeterministic  systems.  For  example,  at  high  levels 
of  abstraction,  computations  are  often  modeled  as  taking  an  arbitrary  but  finite  number 
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of  steps.  It  is  not  possible  to  Terify  tbnt  sndi  a  system  will  make  progress  using  only  the 
X  operator. 

It  is  difficult  to  accurately  compare  the  perftnmance  of  all  of  the  symbolic  verification 
methods.  We  hdiere  that  the  best  comparison  technique  is  to  study  how  the  CPU  time 
required  for  verification  grows  asymptotically  with  larger  and  larger  instances  of  a  class  of 
circuits.  For  all  of  the  example  circuits  we  have  tried  with  our  methods,  this  growth  rate  is  a 
small  p<dyaomial  in  the  number  of  components  of  the  drcnit.  Of  the  other  groups  mentioned 
above,  only  Bryant,  Beatty  and  Seger  [8]  have  demonstrated  good  asymptotic  performance 
on  a  nontrivial  class  of  drcnits.  Berthet,  Condert  and  Madre  [1]  did  demonstrate  verification 
times  that  were  sublinear  in  the  number  of  states  in  the  system,  but  these  times  were  still 
exponential  in  the  number  of  components. 

The  remainder  of  the  paper  is  organised  as  follows.  After  reviewing  BDDs  in  section  2, 
we  show  how  to  use  BDDs  to  represent  drcnits  in  section  3.  We  describe  algorithms  for 
finding  reachable  states  and  computing  relational  products  in  sections  4  and  5,  respectively. 
Symbolic  algorithnu  for  GTL  model  checking  are  described  in  section  6.  Empirical  results 
are  ^ven  for  synchronous  circuits  in  section  7  and  for  asynchronous  drcnits  in  section  8.  We 
dose  with  some  discussion  in  section  9. 

2  Binary  Decision  Diagrams 

Ordered  binary  decision  diagrams  (BDDs)  are  a  canonical  form  representation  for  boolean 
formulas  [7].  They  are  often  substantially  more  compact  than  traditional  normal  forms  such 
as  conjunctive  normal  form  and  disjunctive  normal  form,  and  they  can  be  manipulated  very 
effidently.  Hence,  they  have  become  widely  used  for  a  variety  of  CAD  applications,  induding 
symbolic  simulation,  verification  of  combinational  lope  and,  more  recently,  verification  of 
sequential  circuits.  A  BDD  is  similar  to  a  binary  decision  tree,  except  that  its  structure  is 
a  directed  acyclic  graph  rather  than  a  tree,  and  there  is  a  strict  total  order  placed  on  the 
occurrence  of  variables  as  one  traverses  the  graph  from  root  to  leaf.  Consider,  for  example, 
the  BDD  of  figure  1.  It  represents  the  formula  (a  A  fr)  V  (c  A  d),  using  the  variable  ordering 
a  <b  <  e<  d.  Given  an  assignment  of  boolean  values  to  the  variables  a,  6,  c  and  d,  one  can 
dedde  whether  the  assignment  makes  the  formula  true  by  traversing  the  graph  beginning  at 
the  root  and  branching  at  each  node  baaed  on  the  value  assigned  to  the  variable  that  labels 
the  node.  For  example,  the  assignment  (a  1,6  «—  0,c  l,d  «—  1)  leads  to  a  leaf  node 
labded  1,  hence  the  formula  is  true  for  this  assignment. 

Bryant  showed  that  given  a  variable  ordoing,  there  is  a  canonical  BDD  for  every  for¬ 
mula  [7].  The  size  of  the  BDD  can  depend  critically  on  the  variable  ordering.  Bryant  pves 
algorithms  for  computing  the  BDD  representations  of  -</  and  /  V  p  given  the  BDDs  for 
formulas  /  and  p.  These  algorithms  have  complexity  linear  in  the  product  of  the  sizes  of  the 
argument  BDDs.  The  only  other  operations  which  we  require  for  the  algorithms  that  follow 
are  quantification  over  boolean  variables  and  substitution  of  variable  names.  Bryant  gives 
an  algorithm  for  computing  the  BDD  for  a  restricted  formula  of  the  form  f\t=o  or  /!•=!,  i.e., 
/  with  the  variable  v  set  to  0  or  1.  The  restriction  algorithm  allows  us  to  compute  the  BDD 
for  the  formula  Bv[/],  where  v  is  a  boolean  variable  and  /  is  a  formula,  as  /|«-o  V  f\v=i. 
The  substitution  of  a  variable  m  for  a  variable  v  in  a  formula  /,  denoted  f{v  *—  w)  can  be 
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Figttxe  1:  A  Binuy  dednon  diagram 

accompfiahed  Ming  quantification: 

/(»  to)  s=  3u[(«  to)  A  /]. 

More  efficient  algoritluns  are  ponibk,  hotveuer,  &r  the  case  of  quantification  over  multiple 
variables,  or  multiple  renamingp.  In  the  hitter  case,  efficiency  depends  on  the  ordering  of 
variables  in  the  BDDs  being  the  same  on  both  sides  of  the  substitution. 

Another  tray  to  view  BDDs  is  as  a  form  of  deterministic  finite  automata.  An  m-aigument 
boolean  function  can  be  identified  with  the  set  of  strings  in  {0,1}"  that  evaluate  to  1. 
Since  this  is  a  finite  language  and  aU  finite  languages  are  regular,  there  is  a  minimml  finite 
automaton  that  accepts  this  set.  This  automaton  provides  a  canonical  representation  for 
the  original  boolean  function.  Logical  operations  on  boolean  fonctions  can  be  implemented 
by  set  operations  on  the  languages  accepted  by  the  finite  automata.  For  example,  AND 
corresponds  to  set  intersection.  Standard  constructions  from  dementaiy  automata  theory 
can  be  used  to  compute  these  operations  on  languages.  The  standard  BDD  operations  can 
be  viewed  as  analogs  of  these  constructions. 

3  Representing  Circuits 

We  begin  by  describing  how  to  represent  drcnits  symbolically.  This  involves  representing 
sets  of  drcuit  states  and  deriving  the  transition  rdation  of  the  drcuit.  Consider  a  circuit 
with  a  set  K  of  state  hdding  nodes.  Fbr  a  synchronous  drcuit,  the  set  V  is  typically  the 
outputs  of  aU  the  registers  in  the  drcuit  together  with  the  primary  inputs.  In  the  case  of  an 
asynchronous  circuit,  V  is  usually  the  set  of  all  nodes.  A  state  of  the  circuit  can  be  described 
by  pving  values  for  all  the  nodes  in  V.  Alternative,  if  we  create  a  boolean  variable  for  each 
node  in  V,  then  a  state  can  be  described  by  a  valuation  assigning  dther  0  or  1  to  each  variable. 
Given  a  valuation,  we  can  also  write  a  bodean  expression  which  is  true  for  exactly  that 
valuation.  For  example,  given  V  =  {uotVitVs}  ud  the  valuation  (oo  l,vi  1,0]  *-  0), 


8 


we  deriTe  the  boolean  fonnnla  vq  A  «i  A  ->03.  This  boolean  formnla  can  then  be  represented 
using  a  BDD.  In  general,  a  boolean  formula  may  be  true  for  many  different  valuations.  If  we 
adopt  the  convention  that  a  formula  represents  the  set  of  all  valuations  that  make  it  true, 
then  we  can  describe  sets  of  states  by  boolean  formulas  and,  hence,  by  BDDs.  In  practice, 
BDDs  are  often  mudi  more  efficient  than  representing  sets  of  states  explicitly.  We  denote 
sets  of  states  with  the  letter  5,  and  we  denote  the  BDD  representing  the  set  S  by  S{y), 
where  V  is  the  set  of  variables  that  the  BDD  may  depend  on. 

In  addition  to  representing  sets  of  states  of  a  circuit,  we  must  be  able  to  represent  the 
transitions  that  the  circuit  can  nmke.  To  do  this,  we  extend  the  idea  used  above.  Instead  of 
just  representing  a  set  of  states  using  a  BDD,  we  represent  a  set  of  ordered  pairs  of  states. 
We  cannot  do  this  using  just  a  single  copy  of  the  state  variables,  so  we  create  a  second  set 
of  variables  V.  We  think  of  the  variables  in  V  as  prtaeiU  state  variables  and  the  variables 
in  V'  as  next  state  variables.  Each  variable  «  in  V  has  a  corresponding  next  state  variable 
in  V\  which  we  denote  by  v'.  A  valuation  for  the  variables  in  V  and  V"  can  be  viewed 
as  designating  an  ordered  pair  of  states  in  the  circuit,  and  we  can  represent  sets  of  these 
valuations  using  BDDs  as  above.  We  refer  to  sets  of  pairs  of  states  as  transition  relations.  If 
iV  is  a  transition  relation,  then  we  write  N{Vy  V)  to  denote  the  BDD  that  represents  it.  We 
always  use  an  ordering  for  the  BDD  variables  for  which  the  present  and  next  state  variable 
are  interleaved  and  every  present  state  variable  v  is  adjacent  to  its  corresponding  next  state 
variable  v*. 

S.l  Synchronous  Circuits 

The  method  for  deriving  the  transition  relation  of  a  synchronous  circuit  can  be  illustrated 
using  a  small  example.  The  circuit  in  figure  2  is  a  modulo  8  counter.  Let  V  =  {vo,vi,V3} 
be  the  set  of  state  variables  for  this  circuit,  and  let  V  =  K.v'i.vj}  be  another  copy  of  the 
state  variables.  The  transitions  of  the  modulo  8  counter  are  given  by 

v'o  =  --Vo, 

v\  =00©®!, 

=  (®o  A  ®i)  ©  ®3. 

The  above  equations  can  be  used  to  define  the  relations 

=  (1) 
Wn  =  W«»(^oA®,)©®,), 

which  describe  the  constraints  each  Vj  must  satisfy  in  a  legal  transition.  These  constraints 
can  be  combined  by  taking  their  conjunction  to  form  the  transition  relation 

N{v,  V)  =  V,  r)  A  Nx{v,  V)  A  N^iy,  V). 

In  the  general  case  of  a  synchronous  circuit  with  n  state  holding  nodes,  we  let  V  = 
{vq, ...,v„_i}  and  V'  =  Analogous  to  the  modulo  8  counter,  for  each  state 

variable  Vj  there  is  a  function  fi  such  that 
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Figure  2:  Synduonons  modulo  8  coimter. 


These  eqnotions  are  used  to  define  the  relations 

WVT  =  W«»/i(v)). 

Continning  the  analogy  with  the  modnlo  8  counter,  the  coiynnction  of  these  relations  forms 
the  transition  relation 


7ir(  V,  VO  =  ivicv,  n  A  •  •  •  A  viu,  ( V,  r). 

Thus,  the  transition  rdation  for  a  synchronous  circuit  can  be  expressed  as  a  conjunction  of 
rdations. 

Given  a  BDD  for  each  function  /mH  is  straightforward  to  compute  the  BDD  that  rep¬ 
resents  N.  We  say  such  a  transition  relation  is  monoHthie  because  it  is  represented  by 
a  single  BDD.  Monolithic  transition  relations  have  been  used  successfully  for  CTL  model 
checking  [12],  but  the  primary  bottleneck  is  the  sine  of  the  BDD  for  the  transition  relation. 

3.2  Asjrnchronous  Circuits 

As  with  synchronous  circuits,  the  transition  relation  for  an  asynchronous  circuit  can  be  ex¬ 
pressed  as  a  coignnction  of  relations.  Alternatively,  it  can  be  expressed  as  a  disjunction. 
To  simplify  the  description  of  how  these  forms  of  transition  relation  are  obtained,  we  as¬ 
sume  that  all  the  components  of  the  circuit  have  exactly  one  output,  and  have  no  internal 
state  variables.  In  this  case,  it  is  possible  to  completely  describe  each  component  by  a  func¬ 
tion  fi{V);  given  values  for  the  present  state  variables  V,  the  component  drives  its  output 
to  the  value  specified  by  fi{V).  For  some  components,  such  as  C-elements  and  fiip-flops, 
the  function  fi{V)  may  depend  on  the  current  value  of  the  output  of  the  component,  as 
well  as  the  inputs.  ESxtending  the  method  to  handle  components  with  multiple  outputs  is 
straightforward. 

In  speed-independmit  asynchronous  circuits,  there  can  be  an  arbitrary  delay  between 
when  a  transition  is  enabled  and  when  it  actually  occurs.  We  can  model  this  by  allowing 
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eadi  component  to  nondetenninistica]^  choose  whether  to  traantion  its  output.  Hus  results 
in  a  conjunction  of  n  parts,  aU  of  the  form 

«( V,  n  -  (•!  ♦»  Mn)  V  w  •»  »i). 

The  above  model  for  asynchronous  circuits  allows  wires  to  transition  concurrently.  We 
can  also  use  an  interleaving  model,  which  allows  only  one  wire  to  transition  at  a  time.  First, 
we  ^ply  the  distributiTe  law  to  the  conjunction  of  the  pving  a  disjimetion  of  2"  terms. 
Each  of  these  tenns  corresponds  to  the  simnltaneons  transitkming  of  some  subset  of  the  n 
wires  in  the  circuit.  Second,  we  keep  only  those  tenns  that  correspond  to  exactly  one  wire 
being  allowed  to  transition.  This  results  in  a  disjunction  of  the  form 

N{V,  V)  =  «|(V,  V)  V  ...  V  N^,(y,  n. 

where 

«i(v.  VO  =  (.!  <*  MV))  A  /\(»;  <*  .,). 

i»K 

It  is  possible  for  an  interleaving  model  of  an  asynchronous  circuit  to  give  a  different  set 
of  reachable  states  than  a  non-interleaving  model.  However,  this  does  not  occur  for  the 
asynchronous  drenits  we  verified  in  section  8. 

S.S  Partitioned  Hnnsition  Relations 

Monolithic  transition  relations  are  not  the  most  effident  way  to  represent  the  possible  tran¬ 
sitions  of  a  dreuit.  Recall  that  the  transition  relations  for  synchronous  and  asynchronous 
circuits  have  the  form  of  conjunctions  or  disjunctions  of  a  number  of  pieces  iV{(  V,  V').  Each 
of  these  pieces  can  typically  be  represented  by  a  small  HDD.  In  our  experience,  these  HDDs 
usually  have  fewer  than  100  nodes,  often  many  fewer;  only  very  rarely  do  they  have  more 
than  1000  nodes.  Instead  of  forming  the  conjunction  or  disjunction  of  the  iVi(V,  V'),  we  can 
represent  the  dreuit  by  a  list  of  these  HDDs,  which  are  implidtly  conjuncted  or  disjuncted. 
We  call  such  a  list  a  parlitioned  tratuiUon  relation  [10,  11]. 

For  the  conjunctive  transition  relations  described  above,  the  Ni  could  be  of  the  form 

Ni{y,v)  =  {v[^U{v)) 

for  synchronous  dicuits  or 

HVX)  =  [<  »  MV))  V  «  «  n) 

for  asynchronous  circuits,  where  fi  is  a  transition  function.  It  can  be  shown  that  the  size  of 
the  HDD  for  each  Ni  is  at  most  a  constant  foctor  larger  than  the  HDD  for  /{.  In  practice, 
the  difference  in  size  is  insignificant.  Effectively,  using  partitioned  transition  relations  to 
represent  a  dreuit  requires  no  more  HDD  nodes  than  using  transition  functions. 

For  the  disjunctive  transition  relations  described  above,  the  Ni  could  be  of  the  form 

mv,v)  =(./!»  MV))  A  AK/  »  »/)• 
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In  this  cue  tlie  BDD  for  iVj  is  not  gnnranteed  to  be  only  a  constant  factor  larger  than  the 
BDD  for  fi\  it  could  be  a  factor  of  n  larger,  where  n  is  the  number  of  state  variables  of  the 
entire  circuit.  However,  there  is  an  additional  technique  for  efficiently  representing  relatiou 
of  this  form.  Let 

R(VX)  =  v[^MV). 

Use  the  pair  {S(VyV')yi)  to  represent  Ni(V)  with  the  interpretation  that  for  all  v ^  €  K',  if 
j  =  t  then  o^  is  constrained  by  iZ(V,  V'),  otherwise  o^  is  constrained  to  be  equal  to  Vj.  Our 
software  for  manipulating  transition  relations  hu  been  adapted  to  take  advantage  of  this 
representation. 

While  a  partitioned  transition  relation  with  one  BDD  for  each  state  variable  is  almost 
always  more  efficient  than  constructing  a  monolithic  transition  relation,  it  may  not  be  the 
best  choice.  As  long  u  the  BDDs  do  not  become  too  large,  it  is  better  to  combine  some 
of  the  V')  into  one  BDD  by  forming  their  conjunction  or  disjunction,  u  appropriate. 
Fewer  BDD  nodes  may  be  needed  in  this  representation  if  the  Ni  that  are  combined  have 
iriniil«.r  structure  near  the  root  of  their  BDDs.  Combining  some  of  the  BDDs  in  a  partitioned 
transition  can  also  speed  up  the  algorithms  for  model  checking  and  reachability  analysis  (see 
section  5). 

4  Finding  Reachable  States 

Many  of  the  ideu  used  in  s]rmbolic  verification  can  be  explained  by  considering  the  problem 
of  computing  reachable  state  sets,  since  reachable  state  computations  are  at  the  heart  of 
model  checking,  state  machine  comparison,  etc.  Let  So  be  a  set  of  states,  represented  by  the 
BDD  5o(V’).  We  wish  to  compute  a  BDD  5(V)  that  represents  the  states  reachable  from  So 
via  the  transitions  in  the  transition  relation  N.  We  first  consider  the  problem  of  finding 
those  states  Si  reachable  in  at  most  one  step  from  So-  This  set  of  states  is  given  by 

5i  =  5o  U  {  s'  I  3s  [s  €  5o  A  (s,s')  6  iV]  }• 

Given  the  BDDs  5o( V)  and  N{Vy  V'),  we  can  compute  a  BDD  representing  Si  by  performing 
the  logical  operations  corresponding  to  the  above  expression: 

SiiV)  =  5o(0  V  3  [So{V)  A  N{Vy  V')] . 

«€V 

(The  existential  quantifier  notation  above  indicates  the  existential  quantification  of  all  vari¬ 
ables  V  in  V.)  Similarly,  those  states  reachable  in  at  most  two  steps  are  represented  by 

5.(V')  =  S.(V")  V  3  [S.(K)  A  N{V,  n]  • 

•€V 

In  general,  the  states  reachable  in  at  most  k  -I- 1  steps  are  represented  by 

5w.(v') = s,(n  V  3  N*') 

•€V 
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Note  that  each  set  of  states  is  a  superset  of  the  previous  one.  Since  the  total  number  of 
states  is  finite,  at  some  pmnt  we  must  have  Sk+i  =  5^.  No  further  states  are  reachable,  so 
the  set  of  aU  reachable  states  is  represented  by  5(V)  =  Sk{y). 

The  above  computation  can  be  viewed  as  finding  a  least  fixed  poinL  A  fixed  point  of  a 
function  /  is  some  value  x  such  that  /(c)  sc.  If  we  have  an  ordering  on  values,  and  c  is 
the  smallest  fixed  point  under  the  order,  then  c  is  the  least  fixed  point  of  /.  The  greatest 
fixed  point  is  analogously  defined.  The  functions  that  we  will  be  interested  in  are  functions 
from  sets  of  states  to  sets  of  states.  We  call  such  a  function  a  predicate  transformer.  We 
will  use  set  containment  as  the  ordering  between  sets  of  states.  A  predicate  transformer  F 
is  monotonie  if  5  C  5'  implies  /’(5)  C  F(S*).  A  basic  result  of  fixed  point  theory  is  that 
monotonic  fimctions  have  a  well-defined  least  fixed  point  and  greatest  fixed  point. 

Consider  the  predicate  transformer  F  defined  by 

F(S)  =  5o  U  {  s'  I  3a(s  €  5  A  (s,s')  6  N] }. 

If  we  represent  the  state  sets  by  BDDs,  the  function  F  can  be  viewed  as  specifying  a  sequence 
of  lopcal  operations  on  BDDs.  In  particular, 

(F(5))(n  =  So{r)v  3  [s(v)A^f(v,r)] 

•€V 

Note  that  (^(5i))(V')  =  5i4.i(V').  Thus,  applying  F  represents  one  step  in  the  reachability 
computation.  The  sequence  of  state  sets  0,  So  =  F(0),  Si  =  F*(9)  =  JF(F(0)),  etc., 
converges  to  the  least  fixed  point  of  F  under  the  set  contunment  ordering.  This  least  fixed 
point  is  exactly  the  set  of  reachable  states.  Direct  iteration  is  the  method  of  computing 
the  BDD  S(y*)  representing  this  fixed  point  by  repeatedly  computing  from  5{(F'). 

The  predicate  transformer  F  also  has  a  greatest  fixed  point  under  set  inclusion.  This  fixed 
point  may  also  be  obtained  via  direct  iteration  by  starting  from  the  set  of  all  states. 

Computing  fixed  points  of  predicate  transformers  similar  to  F  is  a  fundamental  step 
in  symbolic  verification,  so  it  is  worthwhile  to  examine  the  computational  complexity  of 
this  problem.  The  direct  iteration  method  involves  repeatedly  computing  (F(5i))(V')  and 
checking  the  equivalence  of  5i(V')  and  5',>i(V')  in  order  to  determine  whether  a  fixed  point 
has  been  reached.  The  time  complexity  of  checking  equivalence  is  either  constant  or  linear 
in  the  sizes  of  the  BDDs  representing  the  formulas,  depending  on  the  BDD  implementation. 
Most  of  the  computational  effort  goes  into  computing  (F(5{))(V'').  The  most  expensive  step 
of  this  is  computing 

3  [Si(nAJ>r(v.v')]. 

•€K 

This  is  an  example  of  a  relationai  product  computation.  Although  relational  products  can 
be  computed  using  the  normal  BDD  algorithms  for  restriction  and  boolean  connectives,  it 
is  much  more  effident  to  use  a  spedal  purpose  algorithm.  We  will  discuss  this  algorithm  in 
section  5. 

4.1  IVontier  Set  Simplification 

In  order  to  perform  reachability  computations  more  effidently,  a  technique  called  frontier 
set  simpKfictttic ft  due  to  Coudert,  Berthet,  and  Madre  [18]  is  often  used.  Their  technique 
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often  teduoes  ike  one  of  the  BDD  x^zeteniing  the  let  of  itetes  on  the  ''lenrch  frontier”  (i.e., 
the  Mt  of  ftnteg  in  Si+i  bnt  not  in  5<).  Gonnda  the  let  of  states  St  described  above: 

St  =  5.u{s*  I  3a (s  €  5i  A(s,/)  6  JV]}. 

Suppose  that  we  step  forwards  from  the  states  on  the  search  frontier  5i— 5e>  i.e.y  we  compete: 

{••\3s[s€{St-St)Ais,/)£N]}. 

This  yidds  a  superset  ci  St  —  S\  (it  may  also  include  some  states  in  S\).  If  we  then  add  in 
aU  the  states  in  5t,  we  will  obtain  5s.  Thus,  the  expression  for  St  cnn  be  rewritten  as: 

5.  =  5,u{.'|a.(.6s;A(.,.')eJvi}, 

where  S{  is  the  frontier  Si  —  5o.  In  fact,  it  is  sufEident  to  dioose  any  S[  satisfying  5i  —  5o  C 
Q  ^1*  Given  this  freedom,  we  would  like  to  choose  5(  so  that  its  BDD  representation  is 
smaU.  Goudert,  Berthet,  and  Madre  describe  an  algprithm  for  this.  Their  procedure  takes 
two  BDDs  S{V)  and  C{y)  as  input:  we  view  these  as  a  state  set  and  a  “care  set”.  It 
produces  as  output  a  BDD  5((V)  such  that  5(V)  A  C{V)  =  5}(V)  A  C{V)  (that  is,  5(V) 
and  5i(V)  agree  for  the  states  that  we  care  about)  and  such  that  the  BDD  5((V)  is  usually 
smaller  than  the  BDD  S(V).  Intuitively,  we  are  simplifying  the  representation  of  the  set  5  by 
adding  or  removing  states  not  in  G.  In  the  enmple  above,  we  would  apply  the  simplification 
algorithm  to  5i(V)  using  the  complement  of  So  as  the  ”care  set”. 

Using  tbia  idea,  the  algorithm  for  computing  the  set  of  readbable  states  is  modified  as 
follows.  First,  let  Si  be  equal  to  So.  The  set  5h-i  of  states  reachable  in  k  + 1  or  fewer  steps 
is  ipven  by 

Sk+i  =  5fc  U  {s'  I  3s[s  €  5;  A  (s,s')  6  AT] }, 

where  SI  is  the  result  of  simplifying  the  set  So  relative  to  the  "care  set”  given  by  the 
complement  of  Sk~i.  Notice  that  using  frontier  set  simplification  does  not  result  in  a  memory 
savings;  all  of  the  BDDs  in  the  ori|^al  reachability  algorithm  are  still  computed.  In  fact, 
memory  usage  can  increase  since  the  BDDs  for  the  SI  must  be  computed.  The  potential 
advantage  of  frontier  set  simplification  is  that  smaller  BDDs  are  used  in  the  relational  product 
computation.  In  practice,  frontier  set  simplification  usually  results  in  an  insignificant  increase 
in  memory  usage  and  a  significant  constant  factor  decrease  in  computation  time. 

4.2  Iterative  Squaring 

One  potential  problem  with  reachability  computations  is  that  the  number  of  iterations 
needed  to  find  a  fixed  print  may  be  exponential  in  the  number  of  components  of  the  sys¬ 
tem.  We  have  studied  a  method  for  computing  fixed  prints  called  iterative  squaring  that 
can  drastically  reduce  the  number  of  iterations  need«l  [12,  13,  14].  The  direct  iteration 
algorithm  computes  the  least  fixed  point  of  i*  by  computing  f  (0),  #^(0),  /^(0),  etc.,  until  a 
fixed  point  is  reached.  Iterative  squaring  depends  on  noting  that  the  predicate  transformer 
F*,  which  is  pven  by 

F*(5)  =  5o  U  {  s'  I  3s  [s  6  5  A  ((s,s')  6  W  V  3s''  [(s,s'')  6  W  A  (s”, s')  €  N])] }, 
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is  of  the  seme  fonn  as  F;  the  difference  is  that  N  has  been  replaced  by 

N  u  {(.,.•)  I  aj-K.,.*)  €  Jir  A  (**,  jO  €  i\r| }. 

The  BDD  representation  of  this  rdation  can  be  computed  as 

jv(v;r)v  3  [JV(v,OAJV(v*,ro]. 

•■Ik*  '■ 

We  call  this  operation  9^uarm§  N.  Let  JVo  denote  W,  and  let  Ni^i  be  the  square  of  Ni.  The 
predicate  transformer  is  equal  to 

=  Si  U  {  s'  I  3s  (s  6  5  A  (s.s')  6  Ni] }. 

By  repeated  squaring  starting  from  N,  we  eventually  readb  a  fixed  point  Nk  which  is  the 
transitiTe  closure  of  N.  Using  Nk  to  compute  F^^\9)  gives  the  least  fixed  point  of  F  directly. 
The  number  of  steps  needed  to  compute  the  fixed  point  with  this  method  is  logarithmic  in 
the  number  of  steps  needed  with  direct  iteration  (assuming  the  diameter  of  the  state  graph 
is  not  reduced  when  restricted  to  reachable  states).  Note,  however,  that  this  approach  may 
be  impractical  if  the  BDDs  needed  to  represent  the  intermediate  computations  become  too 
large.  Unfortunately,  this  appears  to  be  the  normal  case  in  practice.  In  our  experience, 
iterative  squaring  has  been  more  efficient  than  direct  iteration  only  on  extremely  simple 
examples  such  as  counters. 

4.S  Invnrianta 

Invariant  checking  is  a  standard  method  for  verifying  safety  properties  of  systems.  In  our 
context,  invariant  checking  requires  defining  a  set  of  states  W  to  be  an  invariant  and  defining 
a  set  2o  of  ''bad**  states,  which  are  states  that  the  circuit  being  verified  should  not  enter.  It 
must  then  be  verified  that 

1.  the  initial  state  (or  states)  is  contained  in  W, 

2.  all  states  reachable  from  W  are  contained  in  W,  and 

3.  W  and  Zo  are  disjmnt. 

Clearly  these  conditions  are  sufficient  for  showing  that  none  of  the  states  in  Zo  are  reachable. 

The  ocact  definition  of  Zo  depends  on  the  correctness  criteria  for  the  circuit.  For  an 
asynchronous  circuit,  Zo  might  be  the  set  of  states  in  which  a  hasard  can  occur.  The  method 
can  also  be  used  to  check  whether  two  synchronous  drcuits  have  equivalent  input/output 
behavior.  In  this  case  Zo  is  the  set  of  global  states  (ordered  pairs  (so>ei)  where  so  and  Si 
are  each  states  of  the  respective  circuits)  where  the  two  circuits  have  different  outputs. 

In  verification  methods  that  use  theorem  provers,  the  invariant  W  is  typically  represented 
by  a  formula  in  some  appropriate  logic.  The  user  must  usually  expend  a  lot  of  effort  to 
discover  the  invariant.  One  of  the  main  advantages  of  using  finite  state  methods  is  that 
an  invariant  can  be  constructed  automatically  without  any  user  intervention.  To  be  more 
specific,  let  5o  be  the  set  of  initial  states  of  a  drcuit,  then  compute  the  set  S  of  states 
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readiftUe  from  So.  If  —  5,  tliea  W  deaxly  latiifies  reqnixements  1  and  2.  Once  S  ii 
computed,  it  ii  easy  to  detennine  whether  the  third  requirement  is  satisfied.  If  the  third 
requirement  is  not  satisfied,  then  the  circuit  is  inccwrect  and  there  does  not  exist  a  set  of 
states  W  that  satisfies  aU  three  requirements. 

An  obvious  refinement  is  to  check  whether  any  states  in  2o  are  reachable  while  5  is 
being  computed,  instead  of  waiting  until  the  computation  is  complete.  Also,  it  is  possible  to 
represent  2o  as  an  implicit  disjunction  of  several  BDDs,  analogous  to  partitioned  transition 
relations.  This  can  significantly  reduce  the  number  of  BDD  nodes  needed  to  represent  Zo, 
and  does  not  complicate  requirement  3.  This  representation  of  2e  is  very  helpful 

when  checking  for  hasards  in  asynchronous  circuits.  The  set  So  can  also  be  r^resented  by 
an  implicit  disjunction  of  BDDs,  but  this  requires  computing  a  different  set  of  reachable 
states  for  each  of  the  BDDs  used,  and  is  not  helpful  for  typical  sets  of  initial  states. 

In  some  cases,  however,  computing  the  states  reachable  from  the  initial  states  is  too 
inefficient.  For  example,  consider  verifying  a  drcuit  that  contains  a  n  bit  counter  that  is 
incremented  every  cycle.  If  the  counter  is  set  to  the  same  value  in  all  of  the  initial  states, 
then  computing  5  will  require  at  least  0(2")  steps.  We  describe  two  methods,  which  do  not 
involve  iterative  squaring,  for  speeding  up  the  computation  of  an  invariant. 

The  fint  method  involves  computing  an  invariant  from  Zo  rather  than  from  So-  Compute, 
by  reverse  reachability  analysis,  the  set  Z  of  all  states  from  which  some  state  in  Zo  can  be 
reached.  Thus,  Z  is  the  set  of  aU  states  that  can  reach  a  "bad”  state.  If  W  is  the  complement 
of  Z,  then  IV  satisfies  requirements  2  and  3.  The  drcuit  is  correct  if  and  only  if  W  also 
satisfies  the  first  requirement,  which  is  equivalent  to  So  H  Z  ^  0.  Thus,  the  difference 
between  forward  and  reverse  reachability  analysis  is  that  the  transition  relation  is  reversed, 
and  the  rdes  of  So  and  Zo  axe  swspped.  Reverse  reachability  analysis  has  been  studied 
by  Filkom  [24],  and  it  can  be  viewed  as  a  generalisation  of  a  earlier  methods  for  finding 
equivalent  states  in  finite  state  machines  [26,  32). 

In  some  cases,  an  invariant  can  be  computed  much  more  quickly  with  reverse  reachability 
analysis  than  forward,  even  if  both  methods  compute  the  same  invariant.  As  an  extreme 
example,  consider  using  reachability  analysis  to  verify  that  two  identical  n  bit  counters  have 
the  same  input/output  behavior.  The  set  Zo  is  the  set  of  states  where  the  two  counters 
have  different  outputs.  If  5o  is  a  singleton  set,  then  2”  steps  will  be  required  to  compute  S. 
However,  Z  can  be  computed  very  quickly  since  Z  =  Zo  in  this  case. 

For  the  second  method  for  speeding  up  the  computation  of  invariants,  notice  that  auto¬ 
matic  computation  of  invariants  and  user  construction  of  invariants  are  just  two  ends  of  a 
continuum.  We  will  only  describe  the  forward  reachability  case  here,  but  the  idea  can  also 
be  applied  in  reverse  using  the  duality  described  above.  The  user  must  choose  a  set  To  of 
states  such  that  So  C  To;  then  the  set  T  of  states  reachable  from  Tq  is  computed.  If  To  is 
diosen  well,  then  T  can  be  computed  from  Tq  more  quickly  (in  fewer  iterations)  than  S  can 
be  computed  from  So-  Usually  To  C  5,  in  which  case  T  =  S.  However,  this  need  not  be 
the  case.  All  that  is  required  for  the  verification  to  go  through  is  that  T  D  Zo  =  0.  We  use 
this  idea  in  the  verification  of  an  asynchronous  stack  circuit  (see  section  8.2).  Rather  than 
starting  the  readiability  search  from  the  set  So  of  initial  states  (where  the  stack  is  empty), 
we  set  To  to  the  set  of  all  possible  quiescent  states  of  the  stack.  This  significantly  reduces 
the  number  of  iterations  necessarr  to  reach  a  fixed  point. 
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5  Computing  Relational  Products 

As  noted  eadier,  oomiratiag  idntional  prodncta  is  a  fudamenial  opcxatkm  in  many  symboEc 
terification  methods.  This  section  describes  the  teduu«tnes  that  ne  use  far  rdational  prodnct 
computations. 

B.l  Basic  Algorithm 

Consider  the  following  relational  prodnct: 

s'(n= 

In  figure  3,  we  giue  a  special  BDD  algorithm  JletfVod  that  performs  this  computation  in  one 
pass  over  the  BDDs  S{V)  and  N{V^  V*).  This  is  impociant  in  practice  since  the  relational 
prodnct  is  computed  without  ever  cemstmeting  the  BDD  for 

s(v)Aiv(v,n. 

which  is  <dten  fairly  large.  The  basic  idea  behind  the  algorithm  is  to  perform  the  normal  con¬ 
junction,  except  that  every  time  we  would  build  a  node  labded  with  an  element  of  V,  we  per¬ 
form  a  disjunction.  The  BDD  S'CV')  is  computed  with  the  call  RelProd{S{V)fN{V,  V'),  V). 

Like  many  BDD  algorithms,  RetProd  uses  a  result  cache.  In  this  case,  entries  in  the 
cache  are  of  the  form  (/,y,  i?,  h),  where  is  a  set  of  variables  and  /,  g  and  h  are  BDDs.  If 
such  an  entry  is  in  the  cache,  it  means  that  a  previous  call  to  ^fProcI(/,y,  E)  retnmed  h  as 
its  result. 

The  algorithm  as  presented  is  independent  of  assumptions  about  the  BDD  variable  or¬ 
dering.  In  our  implementation,  it  has  been  optimised  for  the  case  where  the  present  state 
and  next  state  variables  are  interleaved,  with  corresponding  present  and  next  state  variables 
adjacent  to  each  other. 

The  algorithm,  while  working  well  in  practice  (assuming  N{Vj  V)  is  reasonably  sised), 
has  exponential  cor^plexity  in  the  wor«t  case.  Most  of  the  situations  where  this  complexity 
is  observed  are  czscs  in  which  the  ouwput  S*(V')  is  exponentially  larger  than  the  inputs 
S(V)  and  E(V,  V').  In  such  situations,  any  method  of  computing  5'(V'')  must  have  expo¬ 
nential  complexity. 

The  basic  relational  product  algorithm  requires  having  E(V,  V')  be  a  monolithic  transi¬ 
tion  relation,  consisting  of  a  single  BDD.  We  saw  in  section  3  how  to  construct  this  BDD  for 
synchronous  and  asynchronous  dreuits.  Unfortunately,  for  many  practical  examples,  this 
BDD  is  very  large.  Partitioned  transition  rdations  can  provide  a  much  more  concise  repre¬ 
sentation,  but  they  cannot  be  used  with  the  basic  relational  product  algorithm.  In  the  next 
two  subsections,  we  show  how  to  extend  the  basic  algorithm  to  compute  relational  products 
for  partitioned  transition  relations. 


funetioa  RetProd{ffg:  BDD^  Ei  «ei  of  variaUeM)'.  BDD 

if  /  =  fabe  V  ^  =  fabe 
Totvxnfobe 
if  /  s  (nte  A  f  s  ime 
r«tiiZB  imc 

if  (ftgtEth)  ia  in  the  temilt  cache 
xmtwenh 

else 

let  s  be  the  top  vaziahle  of  / 
let  y  be  the  top  ^raiiaUe  oi  g 
let  a  be  the  topmoet  of  x  and  y 
he  :*  J8efPrtirf(/U,yU,^) 
hj:*JWfW(/U,yU,^) 
if  a  €  £ 

h  :=i  0r{h9fhi) 

/*  BDD  for  he  V  hi  */ 

•lae 

h  :=  ^f7!hen£Zte(ayhiyhe) 

/*  BDD  for  (a  A  hi)  V  (-la  A  ho)  */ 

endif 

inaert  (/«y,-l?ih)  in  the  reanlt  cache 
return  h 
endif 


Figure  3:  Relational  product  algorithm 


8.3  Diiljimcthra  PartitioiiiBg 

For  a  ditjucti^e  partitioaed  traaittioia  rdatkm,  tlw  tdatkmal  product  computed  is  of  tlie 
form 

S'(V)  -  3  [S(V)  A  (M|(V.  n  V  •  •  ■  V  VO)]- 

•€V 

Thu  rdational  product  can  be  computed  without  ever  conitructmg  the  BOD  for  the  fhU 
traacition  rdatkm  bp  distxibutiiig  the  eziftential  quantification  orer  the  dujnnctioni: 

S'cn*  3  [WA^o(v;v')]v...v  g  [5(K)Ajv;^i(v,n]* 

•€V  9€V 

Thua,  we  are  able  to  reduce  the  problem  of  cmuputin^  ti>  one  of  computing  a  series 

of  rdational  products  inuduing  rdatiudy  small  HDDs.  Mudi  larger  asynchronous  circuits 

can  be  ueiified  using  this  representation  than  with  a  monolithic  transition  rdation. 

« 

5.3  Coignnctive  Partitioning 

When  using  a  coiyunctiue  partitioned  transition  rdatfon,  the  relational  product  computed 
is  of  the  form 

5'(n=  3  [s('OA(jv.(v,voA"A«^.(v;n)].  m 

•€V 

The  main  difficulty  in  computing  S*{V')  without  building  the  conjunction  is  that  existential 
quantification  does  not  distribute  over  conjunction.  The  method  fpven  bdow  overcomes  this 
difficulty. 

Our  tedinique  [10,  11]  is  based  on  two  observations.  First,  drcnits  exhibit  locality,  so 
many  of  the  JVj(  V,  V)  will  depend  on  only  a  small  number  of  the  variables  in  V  and  V. 
Second,  dthough  existentid  quantification  does  not  distribute  over  conjunction,  subformulas 
can  be  moved  out  of  the  scope  of  existential  quantification  if  they  do  not  depend  on  any  of 
the  variables  being  quantified.  We  will  take  advantage  of  these  observations  by  conjuncting 
the  Niiy,  V')  with  5(  V)  one  at  a  time  and  nwng  "early  quantification"  to  quantify  out  each 
variable  v  when  none  of  the  remaining  iVj(V’,  V)  depend  on  v. 

Consider  the  modulo  8  counter  described  in  section  3.1.  In  this  case, 

5'(r)  =  3i»o  3u,  [5( V)  A  (iVo(V;  VO  A  V,  VO  A  W,(  V,  V*))] . 

Since  conjunction  is  commutative  and  associative,  we  can  rewrite  this  as 

5'(V')  =  3«,  3.,  3t^  [((S(V)  A  N^V,  V))  ^  Ni(V,V^)  ^  Af,(  V,  V*)] .  (3) 

The  reasons  for  computing  the  conjunctions  in  this  particular  order  will  become  dear  mo¬ 
mentarily.  As  mentioned  above,  subformulas  can  be  moved  out  of  the  scope  of  existential 
quantification  if  they  do  not  depend  on  any  of  the  variables  being  quantified.  According  to 
equation  1,  iVo(  V,  V')  does  not  depend  on  vi  or  vi;  thus, 

s-cn  =  3..  (3.>i3».  [(S(V)  A  N^v,\r))  A  jvi(v;v')]  a  ivitv,  v)]. 


Since  iVi(V,  V*)  does  not  depend  on  «a,  we  can  ^ply  tliia  trick  one  more  time  by  writing 

s*(n  -  3^  [3^  [3»i  ((scv)  A  #.(  V.  nl  A  iv,(  K,  n]  a  w  v)]  . 

We  can  now  compute  the  idational  product  in  equation  2  by  starting  with  5(K)  and  at  each 
ftep  combining  tbe  pierions  result  with  an  iVjCV,  V*)  and  quanti^png  out  the  appropriate 
variables.  Thus,  we  have  reduced  the  problem  of  computing  the  foil  rdational  product  to  one 
of  performing  a  series  of  smaller  rdatkmai  product-like  steps.  Notice  that  the  intermediate 
results  may  depend  both  on  variables  in  V  and  variables  in  V\ 

Now  we  can  explain  why  we  chose  the  ordetmg  of  coqjuncts  given  in  equation  3.  We  wish 
to  order  the  V*)  so  that  the  variables  in  V  can  be  quantified  out  as  soon  as  possible 
and  the  variables  in  V*  are  added  as  slowly  as  possible.  This  is  desirable  since  it  reduces  the 
number  of  variables  that  the  intermediate  BDDs  depend  on  and  hence  can  greatly  reduce  the 
siae  of  these  BDDs.  In  this  particular  example,  the  variables  in  V'  are  added  one  at  a  time, 
independent  of  the  ordering  of  the  iV{(V,  V).  Thus,  the  optimum  ordering  for  the  iV{(  V,  V) 
is  determined  Iqr  how  quickly  the  variables  in  V  can  be  quantified  out.  For  each  of  the 
variables  0{  in  V,  consider  the  number  of  terms  that  depend  on  v^:  all  4  terms  depend  on  oq, 
while  3  terms  depend  on  vj,  and  2  terms  depend  on  «].  Thus,  vj  is  the  best  candidate  for  a 
variable  to  quantify  out  early.  This  explains  why  we  chose  to  combine  5(V)  and  Ws(F,  F'), 
the  two  terms  that  depend  on  uj,  as  the  first  step  in  the  computation.  Similarly,  ^1(1^,  V) 
was  diosen  next  because  it  was  the  only  remaining  term  that  depended  on  vi. 

The  above  example  invrdved  computing  the  relational  product  in  a  forward  reachability 
search.  Computing  relational  products  for  backward  reachability  search  is  quite  rimilar 
to  the  forward  reachability  case  described  above.  However,  instead  of  quantifying  out  the 
present  state  variables  when  performing  the  rdational  product,  we  quantify  out  the  next 
state  variables.  This  change  may  affect  the  optimal  ordering  of  the  7V{(V^  V')  when  using 
conjunctive  partitioning.  To  illustrate  this,  we  consider  the  modulo  8  counter  again.  The 
relational  product  that  we  want  to  compute  has  the  form 

S'iv)  =  3.;  3.;  3.;  [5(n  A  ( jyi( V,  VO  A  jv,( v,  vo  a  w,( v,  v*))]  . 

We  rewrite  this  as 

S'(V)  =  3r;  3«;  3.;  [((S(vo  a  jvv,  V))  a  i»,(v,  V))  a  jv,(v,  V)]  .  (4) 

Since,  according  to  equation  1,  Ni{V,V')  does  not  depend  on  or  v^, 

5’(V)  =  3.;  [3.;  V,  [(S(vo  A  N^V,  VO)  A  Ar,(  V,  v)]  a  jvi(v,  v)]. 

Since  Ni{V,  V)  does  not  depend  on  Vg,  we  obtain 

S'( V)  =  3.i  [3.;  [3.;  [(SCV)  A  JV,( V,  VO]  A  iV,( V,  V)]  A  iViC V,  V-)] . 

In  this  particular  example,  the  number  of  new  state  variables  v-  in  the  intermediate  BDDs 
is  independent  of  the  ordering  of  the  W{(V,  V).  However,  the  number  of  dd  state  variables 
V{  depends  on  the  ordering,  and  is  minimised  ^  the  ordering  given  in  equation  4.  Note  that 
this  ordering  is  different  from  the  one  in  equation  3. 
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The  method  described  above  fin  competing  the  relational  product  fi>r  the  modulo  8 
counter  can  be  generafised  to  an  arbitrary  coiyunctiTe  partitioned  transition  relation  with 
n  state  variables,  as  finQows.  The  user  must  choose  a  permutation  ^  of  {0, . . .  ,n  —  1}.  This 
permutation  determines  the  order  in  which  the  partitions  JVj(V’,  V*)  are  combined.  For  each  t, 
let  Di  be  the  set  of  variables  in  V  that  Ni{V,  V*)  depends  on.  Also,  let 

l%— 1 

=  U  D^k). 

kat-fl 

Thus,  Ei  is  the  set  of  variables  contained  in  that  are  not  contained  in  for  any  k 
larger  than  t.  The  Ei  are  pairwise  disjoint  and  their  union  is  equal  to  V.  The  relational 
product  in  equation  2  can  be  computed  as 

^1")=  3  [ftcv.n  A  «•.(.)( v.v')] 

5'(V')=  3  [S.-i(V,V')AW.(..-i)(V,V')]. 

The  ordering  p  has  a  significant  impact  on  how  early  in  the  computation  state  variables  can 
be  quantified  out.  This  affects  the  sise  of  the  BDDs  constructed  and  the  efficiency  of  the 
verification  procedure.  Thus,  it  is  important  to  choose  p  carefully,  just  as  with  the  BDD 
variable  ordering.  In  practice,  we  have  found  it  &irly  easy  to  come  up  with  orderings  which 
pve  good  results. 

5.4  Recombining  Partitions 

ESarlier,  we  described  how  a  circuit  could  be  represented  by  a  set  of  transition  relations 
depending  on  exactly  one  variable  in  V.  We  also  pointed  out  that  combining 
some  of  the  Ni  together  into  one  BDD  can  result  in  a  smaller  representation.  Combining 
parts  of  a  transition  relation  in  this  way  can  also  significantly  speed  up  the  computation  of 
relational  products. 

For  example,  consider  the  case  of  an  n  bit  counter.  With  the  usual  variable  ordering, 
the  number  of  BDD  nodes  needed  to  represent  the  transition  relation  is  linear  in  n  in  both 
the  monolithic  and  folly  partitioned  cases.  Suppose  5(1^)  represents  a  singleton  state  set 
of  the  counter.  Computing  S'{y')  with  the  foDy  partitioned  representation  requires  n  BDD 
operations,  each  of  which  has  complexity  0(n),  for  a  total  complexity  of  O(n’).  On  the 
other  hand,  if  we  use  the  monolithic  relation,  we  perform  one  operation  of  complexity  0(n), 
a  saving  in  time  of  a  factor  of  n.  In  practice,  we  can  often  get  a  speed  up  by  combining 
all  of  the  BDDs  for  any  given  register,  without  significantly  increasing  the  number  of  BDD 
nodes  in  the  transition  relation. 

The  empirical  results  in  sections  7  and  8  ahw  illustrate  the  benefits  of  recombining  par¬ 
titions.  In  particular,  for  the  KEY  benchmark  (section  7.2),  recombining  gave  a  factor  of 
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2S  spead  «p.  Tk«  ICDfliAX  tlMWi  kow  Mcoobiatag  caa  give  a  major  rednctimi  in 

the  ipaee  aaedad  for  ike  tnaaMaa  wlaHw,  as  ««B  at  a  agwifieaBt  speed  up. 

6  Symbolic  Modol  Cbockiag 

6.1  Compvtatioa  l^ee  Legle 

The  logic  that  «e  nae  to  specii^  dtceha  it  a  propoaitioBal  temporal  logic  of  branching  time, 
called  GTL  or  Computation  ^ee  Lo^  [17].  In  this  logic  each  of  the  usual  f<»ward-time 
operators  of  linear  temporal  lope  (G  fbte/b  o*  m9mriantli/^  F  sometime  m  the  /vture,  X 
next  time  and  U  until)  must  be  directly  preceded  by  a  path  fiumUfier.  The  path  quantifier 
can  either  be  an  A  (for  all  computation  paths)  or  an  B  (for  some  computation  path).  Thus, 
some  typical  CTL  operators  are  AG /,  which  will  hold  in  a  state  provided  that  /  holds  at 
all  pmnts  (globally)  along  all  possible  computation  paths  starting  from  that  state,  and  EF/, 
which  win  hold  in  a  state  provided  that  there  is  a  computation  path  such  that  /  hedds  at 
some  point  in  the  future  on  the  path. 

For  explaining  our  verification  procedure,  it  is  convenient  to  express  the  CTL  operators 
with  universal  path  quantifiers  in  terms  of  the  operators  with  existential  path  quantifiers, 
taking  advantage  of  the  duality  between  universal  and  existential  quantification.  Conse- 
quenriy,  in  our  description  of  the  syntax  and  semantics  of  CTL,  we  specify  the  existential 
path  quantifiers  directly  and  treat  the  universal  path  quantifiers  as  syntactic  abbreviations. 

CTL  formulas  are  constructed  from  atomic  propositions  using  boolean  connectives  and 
temporal  operators.  When  verifying  a  dreuit,  the  set  of  atomic  propositions  is  typically 
equal  to  the  set  V  of  state  variables  of  the  drenit.  If  «  is  an  atomic  propodtion  in  V,  then 
the  formula  v  is  true  of  a  dreuit  state  if  and  only  if  the  state  variable  v  is  high  in  that  state. 
The  formal  syntax  of  CTL  formulas  is  ^ven  by  the  following  two  rules: 

1.  every  atomic  proposition  v  in  V  is  a  formula  in  CTL;  and 

2.  if  /  and  g  are  CTL  formulas,  then  so  are  ->/,  /  V  p,  EX/,  E[/Up]  and  EG/. 

Let  5o  be  the  set  of  initial  states  of  a  dreuit,  and  let  fiT  be  a  transition  relation.  We  now 
define  the  semantics  of  CTL  for  such  a  system.  A  path  from  the  state  sq  is  an  infinite  sequence 
of  states  sosist...  such  that  i\r(sj,Sj>i)  holds  for  every  i.  The  propositional  connectives  -• 
and  V  have  their  usual  meanings  of  negation  and  disjunction.  The  other  propositional 
operators  can  be  defined  in  terms  of  these.  X  i*  the  next  time  operator:  EX/  will  be  true 
in  a  state  sq  if  ud  only  if  there  is  a  path  SqSi  . . .  from  sq  such  that  /  is  true  at  Si.  U  is  the 
xmtU  operator:  E[/Up]  will  be  true  in  a  state  sq  if  and  only  if  there  exists  a  path  so^i  • « • 
from  So  rach  that  g  holds  at  some  Sj  and  /  h<dds  at  all  Sj  for  which  i  <  j.  The  operator  G 
is  used  to  express  the  inoarianee  of  some  property  over  time:  EG/  will  be  true  at  a  state  Sq 
if  there  is  a  path  soSi . . .  from  so  such  that  /  holds  at  each  state  on  the  path.  If  /  is  true  in 
state  s,  we  say  that  s  satisfies  /  and  write  s  ^  /.  We  say  that  the  system  satisfies  /  if  s  f=  / 
for  all  states  s  in  5o.  We  will  identify  a  CTL  formula  /  with  the  set{s|s|=/}of  states 
that  make  /  true.  We  also  use  the  fifilowing  syntactic  abbreviatfons  for  CTL  formulas: 

•  AX/  =  -lEX-)/  which  means  that  /  holds  at  all  successor  states  of  the  current  state 
(/  must  h<dd  at  every  next  state). 
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•  EF/  s  EfimeU/]  which  meain  that  ior  lome  path,  then  enita  a  state  on  the  path 
at  which  /  hdxli  (/  it  pocnUe  in  the  fhtuxe). 

•  AFf  s  -iEG->/  which  means  that  fiw  every  path,  there  exists  a  state  <m  the  path  at 
which  /  holds  (/  is  mevstaUe  in  the  fntnxe). 

•  A6/  s  -lEF-i/  which  means  that  for  every  path,  at  every  node  on  the  path  /  holds 
(/  ludds  moariantlif  along  aU  paths). 

a  A[/Up]  s  -<E[->pU-i/  A  ->p]  A  -<EG->p  which  means  that  for  every  path,  there  exists 
an  initial  prefix  of  the  path  such  that  g  htdds  at  the  last  state  of  the  prefix  and  /  holds 
at  aU  other  states  along  the  prefix  (/  holds  ttniil  g  holds,  along  all  paths). 

6.2  Model  Checking 

Model  checking  is  the  problmn  of  determining  whether  a  given  CTL  formula  /  is  true  in  a 
given  state  transition  graph.  There  is  a  program  called  EMC  (Extended  Model  Checker) 
that  verifies  the  truth  of  a  formula  in  a  model  1^  using  efficient  graph-traversal  techniques. 
If  the  model  is  represented  as  a  state  transition  graph,  the  complexity  of  the  algorithm  is 
linear  in  the  siae  of  the  graph  and  in  the  length  of  the  formula.  The  algorithm  is  quite  fast 
in  practice  [5, 17].  However,  an  explosion  in  the  siae  of  the  model  may  occur  when  the  state 
transition  graph  is  extracted  from  a  drcuit,  particularly  if  the  circuit  contains  many  registers 
or  other  memory  elements. 

In  this  section,  we  present  a  model  checking  algorithm  for  CTL  which  uses  HDDs  as  its 
internal  representation,  in  order  to  avoid  explicitly  enumeratmg  the  states  of  the  model.  We 
call  this  gymboUe  model  cheeking.  The  algorithm  is  defined  hy  a  procedure  CHECK  that  takes 
the  CTL  formula  to  be  checked  as  its  argument.  It  returns  a  BDD  5(V')  that  represents 
exactly  those  states  of  the  system  that  satisfy  the  formula.  Of  course,  the  output  of  CHECK 
depends  on  the  system  being  checked;  this  parameter  is  implicit  in  the  discussion  below. 
The  set  So  of  initial  states  is  represented  by  a  BDD  50(1^),  and  the  transition  relation  N 
is  represented  by  the  BDD  N(V,  V)  as  discussed  earlier.  We  assume  that  N  is  totalj  that 
is,  every  state  has  some  successor  state.  This  is  true  for  transition  relations  of  the  forms 
described  in  section  3. 

We  define  CHECK  inductively  over  the  structure  of  CTL  formulas.  If  /  is  an  atomic 
proposition  v,  then  Check(/)  is  simply  the  BDD  v.  The  inductive  steps  for  formulas  of  the 
form  EX/,  E[/Up],  and  EG/  are  given  in  terms  of  intermediate  procedures: 

Check(EX/)  =  CheckEX(Check(/)), 

Check(E[/Up])  =  CheckEU(Check(/),  Check(p)), 

Check(EG/)  =  CheckEG(Check(/)). 

The  definitions  of  these  intermediate  procedures  are  pven  below.  Notice  that  these  inter¬ 
mediate  procedures  take  boolean  formulas  as  their  arguments,  while  CHECK  takes  a  CTL 
formula  as  its  argument.  The  cases  of  CTL  formulas  of  the  form  /  V  p  or  -</  are  handled 
using  the  standard  algorithms  for  computing  boolean  connectives  with  BDDs.  Since  AX/, 
A[/Up]  and  AG/  can  all  be  rewritten  using  just  the  above  operators,  this  definition  of 
Check  covers  all  CTL  formulas. 
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The  fonnuU  EX/  ii  true  ia  a  itate  if  and  only  if  there  exists  a  path  from  that  state  for 
which  the  second  state  on  the  path  satisfies  /.  Since  we  have  assumed  that  the  transition 
relation  is  total,  this  is  equivalent  to  there  being  a  successor  of  the  state  which  satisfies  /. 
Thus,  we  define  CheckEX  such  that 

Chec¥EX(5(V))  *  3  (5(v’0AJNr(v;v')]. 

•'ey' 

Compare  the  definition  of  CheckEX  to  the  rdational  product  in  the  definition  of  Sk^i  in 
Section  4.  They  are  quite  similar  except  that  first  case  computes  the  set  of  states  from 
which  a  state  m  5  can  be  reached,  while  the  second  computes  the  states  that  can  be  reached 
from  a  state  in  5k.  In  other  words,  CHECKEX  performs  one  step  of  a  backward  reachability 
search  instead  of  a  forward  reachability  search.  The  techniques  described  in  section  5  for 
computing  relational  products  can  be  used  here.  (However,  as  discussed  in  subsection  5.3, 
we  may  wish  to  use  different  partition  orderings  for  the  forward  and  backward  reachability 
computations  when  using  conjunctive  partitioning.) 

Recall  that  the  formula  E[/Up]  means  that  there  is  a  computation  beginning  in  the 
current  state  in  which  g  is  true  in  some  future  state  s,  and  /  is  true  in  all  the  states 
preceding  s.  This  means  that  either  g  is  true  in  the  current  state,  or  /  is  true  in  the  current 
state  and  there  exists  a  successor  state  ia  which  E[/Up]  is  true.  In  other  words,  it  is  the 
least  fixed  point  of  the  predicate  transformer  defined  by 

(#’(5))(V)  =  5,(K)  V  (5,(V)  A  C3hbckEX(5(V))), 

where  Sg  and  5/  are  the  sets  of  states  satisfying  g  and  /,  respectively  [16].  The  algorithm  for 
Check  EU  works  by  finding  the  least  fixed  point  of  the  above  predicate  transformer.  This 
fixed  point  can  be  computed  with  either  the  direct  iteration  or  iterative  squaring  methods 
described  earlier. 

The  formula  EG/  states  that  there  exists  a  computation  beginning  with  the  current 
state  in  which  /  is  globally  (invariantly)  true.  This  means  that  /  is  true  in  the  current  state, 
and  EG/  is  true  in  some  successor  state.  This  condition  is  the  greatest  fixed  point  of  the 
predicate  transformer 


(F(5))(V)  =  Sf{V)  A  CheckEX(5(V)), 

where  5/  is  the  set  of  states  satisfying  /.  CheckEG  computes  this  fixed  point,  either  by 
direct  iteration  or  iterative  squaring. 

After  determining  the  set  5  of  states  that  satisfy  the  formula  /,  the  algorithm  checks 
whether  5o  is  a  subset  of  5  (that  is,  whether  -<5o(V}  V  5(V)  is  the  HDD  representing  true). 
If  it  is,  then  the  system  satisfies  /. 

6.3  Eaimess  Constraints 

Next,  we  consider  the  issue  of /otmess.  In  many  cases,  we  are  only  interested  in  the  correct¬ 
ness  along  fair  computation  paths.  For  example,  if  we  are  verifyring  an  asynchronous  circuit 
with  an  arbiter,  we  may  wish  to  consider  only  those  executions  in  which  the  arbiter  does  not 
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ignore  one  of  it«  request  inputs  forever.  This  type  of  property  cannot  be  expressed  directly 
in  CTL.  In  order  to  handle  such  properties  we  must  modify  the  semantics  of  CTL  slightly. 
A  /nmess  eonstrasni  can  be  an  arbitrary  formula  of  the  logic.  A  path  is  said  to  be  fair  with 
respect  to  a  set  of  fairness  constraints  if  each  constraint  holds  mfiniteiy  often  along  the  path. 
The  path  quantifiers  in  CTL  formulas  are  then  restricted  to  fair  paths.  In  the  remainder  of 
this  section  we  describe  how  to  modify  the  algorithm  above  to  handle  fiimess  constraints. 
We  assume  the  fairness  constraints  are  pven  by  a  set  of  CTL  formulas  H  =  {hi, . . . ,  hn}. 

We  define  a  new  procedure  CHECK  Fair  for  checking  CTL  formulas  relative  to  the  fair¬ 
ness  constraints  in  ff.  We  do  this  by  giving  definitions  for  new  intermediate  procedures 
CheckFairEX,  CheCKFairEU,  and  CheckFairEG  which  correspond  to  the  intermedi¬ 
ate  procedures  used  to  define  CHECK. 

Consider  the  formula  EG/  given  fairness  constraints  H.  The  formula  means  that  there 
exists  a  path  beginning  with  the  current  state  in  which  /  holds  globally  (invariantly)  and 
each  formula  in  H  holds  infinitely  often.  The  set  of  such  states  S  is  the  largest  set  with  the 
following  two  properties: 

1.  all  of  the  states  in  5  satisfy  /,  and 

2.  for  all  fairness  constraints  hi,  E  H  and  all  states  s  €  5,  there  is  a  sequence  of  states  of 
length  one  or  greater  from  s  to  a  state  in  S  satisfying  hi,  such  that  all  states  on  the 
path  satisfy  /. 

It  is  easy  to  show  that  if  these  conditions  hold,  each  state  in  the  set  is  the  beginning  of  an 
infinite  computational  path  on  which  /  is  always  true,  and  for  which  every  formula  in  H 
holds  infinitely  often.  Thus,  CheckFairEG  will  compute  the  greatest  fixed  point  of  the 
predicate  transformer  pven  by 

fl 

(F(5))(V)  =  Sf{V)  A  /\  CheckEX(CheckEU(5/(V),5(V)  A  CHECK(h*))), 

h=l 

where  5/  is  the  sets  of  states  satisfying  /  under  the  fairness  constraints  H.  The  fixed  point 
can  be  evaluated  in  the  same  manner  as  before.  The  main  difference  is  that  each  time 
the  above  expression  is  evaluated,  several  nested  fixed  point  computations  are  done  (inside 
CheckEU). 

Checking  EX/  and  E[/Up]  under  fairness  constraints  is  simpler.  The  set  of  all  states 
which  are  the  start  of  some  fair  computation  is 

fair  =  CH£CKFAIR(EGfrue). 

The  formula  EX/  is  true  under  fairness  constraints  in  a  state  a  if  and  only  if  there  is 
a  successor  state  a'  of  a  such  that  s'  satisfies  /  and  s'  is  at  the  beginning  of  some  fair 
computation  path.  Thus,  the  formula  EX/  (under  fairness  constraints)  is  equivalent  to  the 
formula  EX(/  A  fair)  (without  fairness  constraints).  Therefore,  we  define 

CheckFairEX(5/(V))  =  CheckEX(5/(V)  a  fair{V)). 

Similarly,  the  formula  E[/Up]  (under  fairness  constraints)  is  equivalent  to  the  formula 
^  (without  fairness  constraints).  Therefore,  we  define 

CheckFairEU(5/(V),5,(V))  =  CheckEU(5/(V),5,(V)  A /air(7)). 
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7  Verifying  Synchronous  Circuits 

Tliia  lection  giTei  empirical  xeitilti  for  yeiifying  lynchrononi  drcuiti  uiing  both  CTL  model 
checking  and  reachability  analyiii.  We  be^  by  ^plying  model  checking  to  a  simple  pipeline 
circuit.  Reachability  analytii  is  applied  to  two  standard  benchmark  circuits,  MINMAX  and 
KEY. 


7.1  Pipelined  A.LU 

The  pipeline  considered  in  this  section  performs  three-address  arithmetic  and  logical  opera¬ 
tions  on  operands  stored  in  a  register  file.  The  circuit  is  a  generalized  version  of  one  described 
in  an  earlier  conference  paper  [12].  Figure  4  shows  a  block  diagram  for  the  pipeline.  The 
number  of  pipe  reciters  can  be  varied;  if  s  is  the  number  of  pipe  registers,  then  executing 
an  instruction  requires  s  -f  2  cycles. 

1.  During  the  first  cycle  of  the  instruction,  operands  are  read  from  the  register  file  into 
the  instruction  operand  registers. 

2.  During  the  second  cycle,  the  result  of  the  operation  is  computed  and  stored  in  the  first 
pipe  register. 

3.  In  cycles  three  through  s  +  1,  the  result  is  passed  between  pipe  registers. 

4.  In  the  last  cycle,  the  result  is  written  back  to  the  register  file. 

We  have  included  extra  pipe  renters  in  this  version  of  pipeline  to  test  how  the  performance 
of  the  model  checker  depends  on  the  number  of  pipe  registers.  In  a  real  circuit,  operations 
would  typically  be  performed  between  some  of  the  pairs  of  pipe  registers,  but  in  our  example, 
results  are  just  propagated  unchanged. 

Each  instruction  specifies  the  source  and  destination  registers  and  the  operation  to  per¬ 
form.  In  addition,  the  pipeline  has  a  sUdl  input  that  indicates  that  the  instruction  is  invalid 
and  should  be  ignored.  More  specifically,  the  instruction’s  destination  register  should  not 
be  affected  if  the  staff  input  is  true.  The  staff  signal  might,  for  example,  be  used  to  indicate 
an  instruction  cache  miss;  the  signal  would  be  asserted  until  an  instruction  is  fetched  from 
main  memory.  In  order  to  allow  results  to  be  used  before  they  are  actually  written  into  the 
register  file,  data  can  be  fed  from  the  ALU  output  or  from  one  of  the  pipe  registers  back  to 
the  ALU  operand  registers.  We  experimented  with  a  number  of  versions  of  the  pipeline  with 
varying  numbers  of  registers  r,  register  widths  w,  numbers  of  pipe  stages  j,  and  numbers  of 
operations  o. 

The  specification  of  the  pipeline  is  given  in  CTL.  For  simplicity  of  exposition,  we  fix  the 
number  of  general  registers  r  at  2  and  the  number  of  pipe  registers  s  at  1,  and  we  assume 
that  the  pipeline  does  only  exclusive-or  operations.  In  the  actual  verification,  we  used  more 
complex  circuits  with  more  operations.  The  specification  that  we  used  consists  of  two  parts. 
The  first  specifies  that  the  destination  register  is  updated  correctly.  This  is  described  by  a 
set  of  formulas  of  the  following  form: 

AG^-istoff  -*  {{srclop-  0  trcBopf)  ^  resuUifj. 
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Bead  parti 


Write  part 


Here,  arelopi  and  tretopi  mns  abbzenatioiu  ht  fonxmlu  that  tepxesent  the  value  of  the  tth 
bit  at  the  two  lonrce  operaadi  and  ruuHi  ia  a  formula  that  tepresenti  the  tth  bit  of  the 
result  written  into  the  register  file.  The  overall  formula  states  'Sf  the  pipeline  is  not  being 
stalled,  then  the  tth  bit  of  the  result  of  the  current  operation  should  be  the  ezclusive-or  of 
the  tth  bits  of  the  two  source  operands”. 

In  order  to  express  sreiopj,  srvfopj  and  resu&i,  we  need  a  way  of  expressing  the  value 
stored  in  a  bit  of  a  register  some  number  of  cycles  k  in  the  fature.  Since  the  only  nondeter¬ 
minism  in  the  circuit  is  input  nondeterminism,  and  since  the  inputs  cannot  affect  the  state 
of  the  register  file  until  3  cycles  the  future  (assuming  s  equals  1),  this  can  be  done  using  the 
CTL  AX  operator.  That  is 

AX  AX . .  ♦  AJLrt9i_{t 

k 

which  we  abbreviate  as  AX^r^j^,  represents  the  value  of  bit  i  of  register  j  in  k  cycles, 
provided  k  <Z.  We  can  check  the  assumption  that  the  inputs  do  not  affect  the  raster  file 
state  before  3  cycles  elapse  by  verifying  that  EX^r^^^  and  AX^rep^  are  equivalent  for  k 
up  to  3.  The  timings  given  below  do  not  include  this  check;  it  increases  the  times  by  a  factor 
of  two.  Now  arelopi  is  rither  AX’r^Q^-  or  AX’nspj^,  depending  on  whether  the  first  source 
address  is  0  or  1.  The  AX’  accounts  for  the  pipeline  latency;  in  2  cycles,  all  the  values 
currently  being  computed  will  have  been  written  back  into  the  register  file.  Thus,  we  obtain 

arelopi  =  (->sreiaddro  A  AX’r^o^)  V  (sreioddro  A  AX’repi^j). 

Here,  srelaJdti  is  the  «th  bit  of  the  first  source  address  input.  The  formula  for  aretop^  is 
analogous.  The  formula  for  restiftj  is  also  similar,  except  we  use  the  values  in  the  rei^ter 
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file  in  S  cycka  (after  the  operstkm  ia  omipleted),  and  we  aeiect  baaed  cm  destadir^  the 
deatinntkm  addxeaa  re^ater: 

reavki  =  (-idertodldra  A  AX’refa^)  V  (deaieddro  A  AX*rcf 

The  other  part  of  the  apedficatkm  deacribea  what  happena  to  the  regiatera  not  being 
written  (or  to  all  the  regiatera  when  the  jnpeline  atalla).  In  particular,  the  regiater  should 
not  be  altered  by  the  current  operation.  For  example,  for  raster  1: 

AG(( atoll  V  -ideatoddro)  -*  (AX*r^i^* 

Note  that  a  number  of  common  aubformulaa,  such  aa  the  formulas  AJi^ng^,  appear  through¬ 
out  the  specification.  In  the  experiments  described  bdow,  the  set  of  states  satisfying  each 
of  these  subformulas  was  computed  only  once  and  then  saved. 

We  performed  most  of  the  experiments  using  a  partitioned  transition  relation  to  represent 
the  circuit.  IVom  the  block  diagram,  we  see  that  the  circuit  decomposes  naturally  into  pieces. 
We  used  this  decomposition  as  a  starting  point  for  breaking  the  transition  relation  into  parts. 
Some  of  the  parts,  such  aa  the  renter  file,  were  found  to  require  large  HDDs  to  represent; 
we  broke  these  into  more  pieces.  We  also  found  that  we  could  combine  some  of  the  parts, 
such  as  moat  of  the  pipe  regiatera,  without  increasing  the  number  of  BDD  nodes  required; 
we  did  this  to  decrease  overhead.  The  final  decomposition  had  the  following  pieces: 

1.  control  lofpc; 

2.  the  first  pipe  regiater; 

3.  the  other  pipe  renters; 

4.  the  first  ALU  operand  register; 

5.  the  second  ALU  operand  register;  and 

6.  one  piece  for  each  general  register. 

The  ordering  above  was  also  the  ordering  used  for  processing  the  transition  relation.  Mi^th 
this  ordering,  the  number  of  variables  in  intermediate  results  never  exceeded  the  number  of 
state  variables  by  more  than  w,  the  register  width.  We  found  that  the  sizes  of  the  interme¬ 
diate  results  with  this  ordering  increased  monotonically  during  each  step;  thus,  breaking  the 
transition  relation  into  pieces  did  not  result  in  having  to  manipulate  larger  state  set  BDDs 
than  would  have  been  necessary  with  a  single  monolithic  BDD  representing  the  transition 
relation.  This  is  an  important  p<nnt;  in  many  applications  involving  BDDs,  it  is  the  number 
of  nodes  in  intermediate  results  (not  the  final  result)  that  limits  the  size  of  the  problem  that 
can  be  handled. 

In  the  BDD  variable  ordering  that  we  used,  the  source  address  registers  are  nearest  to 
the  root.  The  bits  of  these  renters  are  interleaved.  These  are  followed  by  variables  which 
make  up  the  destination  address  shift  chain  (this  is  a  chain  of  shift  ref^ters  that  are  used  to 
hold  the  destination  address  for  an  operation  until  the  result  of  the  operation  is  written  back 
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into  the  register  file).  For  Mch  stage  in  the  chain,  starting  with  the  leftmost  (input)  stage, 
there  is  a  stall  bit  {allowed  by  a  destination  address  register.  Next  come  the  two  opcode 
shift  repsters,  with  their  Ints  intedeaved.  The  operand  renters,  general  roisters  and  inpe 
registers,  interleaved,  are  at  the  end  of  the  ordering.  AH  registers  are  arranged  with  most 
significant  bit  closest  to  the  root  of  the  BDD,  since  this  results  in  smaller  BDDs  for  the 
operations  used. 

As  mentioned,  we  experimented  with  a  number  of  versions  of  the  pipdine  with  varying 
numbers  of  registers  r,  renter  widths  10,  numbers  of  pipe  stages  s,  and  numbers  of  oper¬ 
ations  o.  For  each  version,  we  collected  information  on  the  siies  of  the  BDDs  needed  to 
represent  the  transition  relation  and  state  sets,  and  on  the  time  required  to  do  the  verifi¬ 
cation.  The  following  table  shows  the  rate  of  growth  in  the  sixes  of  the  various  pieces  of 
the  transition  relation  as  a  function  of  the  parameters.  These  rates  of  growth  were  found 
by  studying  ‘‘profiles**  of  the  BDDs  (histograms  of  the  number  of  nodes  labeled  with  each 
variable).  By  considering  the  circuit*s  operation  and  examining  how  the  profiles  changed  as 
the  parameters  varied,  we  were  able  to  exactly  account  for  the  structure  of  the  BDDs  needed 
to  represent  the  transition  relation. 


control  lo^c 

O(srlogr) 

pipe  registers 

0(iDa) 

ALU  operand  rej^ters 

each  general  register 

The  total  number  of  BDD  nodes  needed  to  represent  the  transition  relation  grows  linearly 
with  each  parameter  except  r,  for  which  it  grows  at  a  rate  of  rlogr.  The  log  r  factors  arise 
because  an  extra  addressing  bit  is  needed  when  r  increases  from  2*  —  1  to  2*.  The  number 
of  partitions  in  the  transition  relation  increased  linearly  with  r,  and  did  not  depend  on 
TO,  3  or  o.  The  number  of  BDD  nodes  in  each  piece  of  the  transition  relation  was  typically 
between  10  and  500.  No  piece  ever  had  more  than  1,500  nodes.  The  way  the  sixes  of  the 
pipe  registers  and  ALU  operand  registers  vary  with  o  depends  on  the  exact  operations.  The 
ones  we  used  were  addition,  subtraction,  and  bitwise  logical  operations.  With  this  set,  the 
control  logic  grew  C7(log  o),  the  pipe  registers  and  ALU  operand  registers  grew  0(o),  and 
the  general  le^sters  did  not  vary  with  o. 

To  make  it  clear  how  the  above  bounds  on  BDD  sixes  are  derived,  we  consider  one  specific 
example:  the  transition  relation  for  the  control  logic.  The  other  pieces  of  the  transition 
relation  for  the  pipelined  ALU  were  analysed' in  a  similar  way.  The  control  logic  consists  of 
two  parts:  the  opcode  shift  chain  and  the  destination  address  register  shift  chain.  Each  shift 
chain  is  used  to  store  information  about  an  operation  until  the  time  that  it  is  to  be  used. 
The  opcode  is  delayed  for  one  cycle  while  the  ALU  operand  registers  are  being  loaded,  and 
hence  the  opcode  shift  chain  is  described  by  the  following  transition  relation: 

(local -1 

/\  opcode;^  opcodco^i. 
i=0 

Here,  opcotUfi^i  is  the  tth  bit  of  the  input  opcode,  and  opeode\^^  is  the  (next  state  value  of 
the)  sth  bit  of  the  register  used  to  control  the  ALU.  With  the  variable  ordering  described 
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aboTB,  this  tmifitioii  reUtion  teqnuet  O(log  o)  BDD  nodes  to  represent.  The  destination 
address  register  shift  chain  is  used  to  hold  the  destination  renter  number  nntil  the  result  of 
the  operation  readies  the  end  of  the  pipeline.  Then  the  last  roister  in  the  chain  is  used  to 
control  the  writeback  into  the  register  die.  The  transition  rdatkm  that  describes  the  shift 
chain  is: 

•+1  IWH-i 

isl  fmO 

In  this  expression,  detto^  is  the  tth  bit  of  the  destination  inpnt.  Because  of  the  variable 
ordering  used,  the  BDD  for  this  transition  rdation  consists  of  s  +  2  sections,  one  for  each 
deati.  At  the  end  of  each  section,  the  BDD  has  0(v)  width,  since  the  value  of  deati  must 
be  ‘‘remembered”  in  order  to  check  that  is  correct.  In  addition,  each  bit  of  deati 

is  Torgotten”  before  encountering  the  corresponding  bit  of  deati^i.  Hence  the  width  of 
the  BDD  is  in  fact  0(r)  everywhere.  The  number  of  variables  that  this  part  of  the  BDD 
depends  on  is  bounded  by  2(s  +  l)flogr]  (the  factor  of  two  accounts  for  current  and  next 
state  variables),  and  hence,  the  total  BDD  siae  is  0(ar  log  r).  The  conjunction  of  the  BDDs 
for  the  first  and  second  parts  pves  a  BDD  of  size  0(sr  log  r  +  log  o). 

We  also  studied  the  BDDs  representing  the  various  state  sets  in  the  verification  and 
used  profiles  to  determine  their  rates  of  growth.  Since  most  of  the  time  and  space  for  each 
verification  was  used  computing  and  represmiting  the  value  of  the  destination  renter  at 
the  end  of  the  current  operation,  we  concentrated  on  these.  Again,  by  understanding  the 
information  captured  by  the  BDDs,  we  were  able  to  determine  how  the  sizes  of  the  BDDs 
were  affected  by  with  the  various  parameters.  The  number  of  nodes  in  these  particular 
state  set  BDDs  grows  as  O  (ra(r  +  a)  log  r  +  wd^{r  +  a)  +  ii;o(r  +  a)’^  (this  growth  rate  was 
obtained  using  the  same  type  of  analysis  as  that  above).  The  largest  BDDs  we  encountered 
had  slightly  less  than  12,500  nodes;  typical  rises  were  about  1,000  nodes. 

We  performed  the  tests  described  above  using  a  CTL  model  checker  written  mostly  in 
the  T  dialect  of  LISP  [33].  The  actual  BDD  manipulation  routines  are  written  in  C  and  are 
roughly  comparable  to  the  package  described  by  Brace,  Rudell  and  Bryant  [4].  The  model 
checker  was  run  on  a  SPARCstation  !+•  Figure  5  shows  how  the  verification  time  depends 
on  the  parameters  r,  w,  a  and  o.  This  plot  (and  the  other  plots  in  this  paper)  uses  a  log  scale 
on  both  axes.  On  such  a  plot,  the  polynomial  relationship  y  =  x”  appears  as  a  straight  line 
with  slope  n.  The  following  table  shows  the  values  used  for  the  fixed  parameters  in  these 
tests. 


r 

TO 

a 

o 

vary  r 

1 

1 

1 

vary  w 

4 

1 

1 

vary  a 

2 

2 

1 

vary  o 

2 

4 

1 

The  verification  time  is  dominated  by  the  time  required  to  compute  the  state  sets  for 
the  subformulas  There  are  rw  such  formulas.  Each  computation  of  this  form 

involvM  one  call  to  RelProd  for  each  piece  of  the  transition  relation.  The  verification  times 
can  be  accounted  for  as  follows. 
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Flgnxe  5:  Pipeline  drcnit  Texification  timet 
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1.  At  r  mcteaaet  from  2*  +  1  to  2*'*'^  for  tome  t,  the  number  of  AX*'*'*  compniationt 
inoteMet  lineariy.  The  number  of  piecet  in  the  trandtion  relation  alto  increatet  Unearly, 
and  in  each  call  to  RelProd^  the  one  of  the  remit  BDD  increatet  linearly.  If  we  make 
the  atramption  that  the  time  to  do  a  BDD  operation  it  linear  in  the  use  of  the  retult 
BDD,  then  we  would  expect  thete  three  linear  increatet  to  produce  cubic  growth  in 
the  yeiification  time.  The  tlopet  of  the  bett  fit  linet  for  r  equal  to  9  through  16  and 
for  r  equal  to  17  through  32  are  both  2.5.  In  the  general  cate  where  r  ranget  over 
more  than  a  factor  of  2,  we  wonld  expect  the  time  to  grow  at  0(r^  log  r),  but  we  do 
not  have  enough  data  to  completely  rabttantiate  thit  conjecture. 

2.  At  to  increatet,  the  number  of  AX**^’  computationt  increatet  linearly  and  the  sice 
of  the  BDD  retulting  from  each  operation  increatet  linearly.  Thit  leadt  ut  to  expect 
quadratic  growth  in  the  verification  time.  The  slope  of  the  bett  fit  line  for  to  equal  to 
17  through  32  it  2.1. 

3.  At  j  increatet,  the  number  of  AX  computationt  needed  to  evaluate  each  formula  of 
the  form 

increatet  linearly  and  the  sixet  of  the  BDDt  produced  within  these  steps  increase 
linearly.  When  computing 

AX*+>n^^  *  AXAX'+'f^^i, 

the  BDDt  during  the  last  AX  operation  grow  quadratically.  Overall  we  expect  quad¬ 
ratic  growth  in  the  verification  time.  The  slope  of  the  bett  fit  line  for  t  equal  to  33 
through  64  it  1.8. 

4.  How  the  verification  time  varies  with  o  depends  on  the  particular  operations  used. 
The  number  of  AX*'*'’  computationt  does  not  change.  The  BDDs  for  the  state  sets 
grow  O(o^)  for  the  operations  mentioned  above.  We  would  thus  expect  quadratic 
growth  in  the  verification  time.  The  slope  of  the  bett  fit  line  for  o  equal  to  9  through 
16  is  1.7. 

It  is  important  to  note  that  in  all  cates,  the  verification  time  it  growing  polynomially  in  the 
number  of  componenit  of  these  example  circuits.  Polynomial  verification  times  were  also 
documented  in  earlier  work  [12,  13,  14].  Other  researchers  [1]  using  symbolic  techniques 
have  demonstrated  verification  timet  that  grow  tublinearly  in  the  number  of  states  of  the 
system,  but  still  exponentially  in  the  number  of  components. 

For  comparison,  we  also  ran  the  verification  with  a  monolithic  transition  relation.  With 
8  bit  registers,  the  monolithic  transition  relation  required  more  than  75,000  BDD  nodes  to 
represent,  compared  with  fewer  than  750  nodes  using  a  partitioned  transition  relation,  a 
difference  of  more  than  two  orders  of  magnitude.  In  addition,  the  verification  needed  nearly 
an  order  of  magnitude  more  time.  We  alto  note  that  combining  parts  of  a  transition  relation 
can  result  in  higher  asymptotic  complexity.  For  example,  the  total  number  of  nodes  in  the 
BDDt  that  represent  the  renter  file  in  the  partitioned  transition  relation  it  0(r  log  r),  while 
the  BDD  for  their  conjunction  hat  0(r^  log  r)  nodes. 
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Putitknied  tmiiition  reUtioBS  can  alao  be  med  to  verify  larger  pipelmea  than  those 
abo^.  We  iretified  a  32-bit  wide  pipcliiie  with  8  general  registers,  2  pipe  registers,  and 
one  operation.  This  example  had  406  state  variables  lesnlting  in  more  thu  10^^  reachable 
states,  and  the  verification  took  1  hour  and  25  minntes  of  CPU  time  on  a  SPARCstation  1+. 

7.2  Other  Synchronona  Examples 

This  section  gives  empirical  resnlts  for  computing  the  set  of  teachable  states  of  the  MINMAX 
and  KEY  benchmarks. 

The  circuits  of  the  IIINMAX  bendimark  each  consist  of  three  contnd  inputs  and  a  data 
path  of  parameterisable  width  io.  The  data  path  is  made  up  of  a  io  bit  input  and  three  to  bit 
state  renters.  The  variable  ordering  we  used  is  quite  standard:  control  at  top,  data  path 
variables  interleaved  and  ordered  most-significant  bit  to  least-significant  bit.  We  considered 
two  different  partitionings  of  the  transition  relation.  The  first  had  one  BDD  per  bit  of  state, 
resulting  in  3t0  BDDs  each  of  use  O(t0).  With  the  ordering  used,  there  was  essentially  no 
sharing  of  nodes  between  these  BDDs,  so  the  total  number  of  nodes  was  The  second 

partitioning  recombined  the  bits  of  each  register  (see  section  5.4),  resulting  in  3  BDDs  each 
still  of  sine  0(io).  Recombining  the  partitions  reduced  the  total  number  of  BDD  nodes 
needed  for  the  transition  relation  from  O(to^)  to  0(id). 

The  CPU  time  needed  to  compute  the  reachable  states  with  the  two  representations 
shows  a  similar  pattern  (see  figure  6).  The  graph  shows  CPU  times  in  tenths  of  seconds  on  a 
Sun  3/60.  The  asymptotic  complexity  in  the  folly  partitioned  case  grows  slightly  foster  than 
quadraticaUy,  while  the  compledty  with  recombining  is  roughly  linear.  This  compares  well 
with  the  CPU  time  required  by  Berthet,  Coudert  and  Madre  [1],  which  grew  exponentially 
with  w.  We  also  tried  a  least-significant  bit  to  most-significant  bit  ordering.  This  reduced 
the  total  number  of  nodes  in  the  transition  relation  with  3tD  partitions  from  C(id^)  to  0[w), 
due  to  sharing.  However,  this  did  not  affect  the  time  required  to  compute  the  reachable 
states. 

We  also  considered  the  KEY  benchmark  circuit^.  The  KEY  benchmark  circuit  has  258 
inputs  {starts  encrypt  and  hey^  through  kepsss)*  variables  {county  through  counts, 

Co  through  C\\\  and  Do  through  Dm)  and  193  outputs.  The  transition  fonctions  for  each 
of  the  count j  state  variables  depend  on  starty  encrypt  and  eounti  for  t  <  j.  The  transition 
functions  for  each  of  the  Cj  depend  on  start,  encrypt,  Cjy  Dj,  counto  through  counts,  uid 
two  of  the  kepi  inputs.  The  same  is  true  of  the  transition  fonctions  for  each  Dj.  Thus,  the 
transition  fonctions  for  each  of  the  Cj  and  Dj  depend  on  (have  a  support  of)  exactly  10 
variables. 

Because  the  size  of  the  support  of  each  transition  fonction  is  small,  the  corresponding 
BDDs  can  be  easily  constructed  for  just  about  any  variable  ordering.  Also,  the  particular 
supports  for  each  state  variable  show  that  the  KEY  circuit  can  be  naturally  viewed  as  113 
communicating  finite  automata:  one  automata  for  the  counto  through  counts  state  variables, 
and,  for  each  j  from  0  to  111,  one  automata  contidning  the  Cj  and  Dj  variables.  Each  of 
these  automata  depend  on  the  count  j  variables,  so  it  is  natural  to  put  those  variables  at 

^Thete  ate  octuany  two  sequential  benchmark  dieuits  called  KErV,  one  with  228  latches  [34]  and  one 
with  58  latches  [19].  We  use  the  one  with  228  latches. 
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tHe  top  of  the  TeriaUe  ordaing.  Also,  the  Cj  end  Dj  thonld  be  interieeved;  we  used  the 
ordering  C7iii,  Duu ...» Co,  Do  in  onr  experiments.  With  this  ordering  of  the  stete  ▼ariebles, 
the  lergest  stete  set  BDD  hes  5584  nodes,  which  is  sn  ewerege  of  less  then  25  BDD  nodes 
per  stete  reiieble.  This  gise  is  e  result  of  fimited  communicetion  between  the  113 
entomete  described  ebove. 

If  foil  pertitioning  is  used,  the  time  necessery  to  compute  the  reecheble  stetes  does 
not  depend  criticeUy  on  the  ordering  of  the  input  veriebles.  Howerer,  the  ordering  cen 
be  importent  if  perts  of  the  transition  reletion  ere  recombined.  We  put  the  encrypt  end 
start  inputs  et  the  top  of  the  <»deiing,  end  pieced  the  hepi  inputs  neer  the  Cj  end  Dj  thet 
depended  on  them.  This  ordering  mede  it  possible  to  use  three  pertitions:  one  for  the  count  j 
veriebles,  one  for  the  Cj  veriebles  end  one  for  the  Dj  veriebles.  The  BDDs  for  the  pertitions 
hed  33,  2464  end  2566  nodes,  respectively.  The  time  required  to  find  the  reecheble  stetes 
of  the  KEY  benchmerk  circuit  wes  1019  seconds  (CPU  time  on  e  SPARCstetion  1+)  when 
using  e  fully  pertitioned  transition  reletion  end  41  seconds  for  the  three  pertition  cese,  e 
speed  up  of  neeriy  e  lector  of  25. 

8  Verifying  Asynchronous  Circuits 

One  espect  of  veri^jring  speed-independent  esynchronons  circuits  is  checking  thet  the  circuit 
hes  no  haxardM.  A  heserd  is  informeUy  defined  es  e  stete  in  which  e  gete  cen  transition,  end 
in  which  enother  transition  cen  disable  the  output  transition  of  the  gete.  This  definition  of 
heserds  covers  both  static  end  d3rnnn]ic  heserds.  In  e  reel  circuit,  e  heserd  may  result  in 
the  output  of  the  gete  storting  to  change  end  then  returning  to  its  previous  state,  with  the 
result  thet  perts  of  the  circuit  may  see  the  transition  end  others  may  not.  We  cen  check  for 
hazards  in  two  steps.  First,  we  compute  the  set  of  stetes  thet  the  circuit  end  its  environment 
cen  reach  ficom  e  given  set  of  initial  stetes.  Then  we  check  thet  none  of  these  stetes  results 
in  e  hazard.  Finding  the  reecheble  stetes  is  the  most  computationally  expensive  of  these 
two  steps.  In  practice,  checking  for  hazards  is  usually  done  es  the  reecheble  stetes  are 
computed.  This  method  cen  be  generalized  to  handle  e  wide  variety  of  safety  properties 
of  asynchronous  circuits  [22].  The  set  of  reecheble  stetes  is  computed  using  the  methods 
described  in  section  4. 

8.1  Modified  Breadth  First  Search 

Recall  thet  asynchronous  circuits  can  be  modeled  using  either  conjunctive  or  disjunctive 
partitioned  transition  relations.  These  correspond  to  non-interleeving  and  interleaving  se¬ 
mantics,  respectively.  There  ere  significant  differences  in  the  complexity  of  doing  reachability 
analysis  using  the  two  models.  Consider  two  uncoupled  systems  JIf '  and  M"  with  disjoint 
sets  of  stete  veriebles  V  end  V".  Let  M  be  the  composition  of  these  two  systems,  and  let 
V  =  This  is  on  unrealistic  example,  but  it  helps  illustrete  whet  happens  when  com¬ 

puting  the  reecheble  stetes  of  loosely  coupled  systems.  The  BDD  •S’(  Y)  representing  the  set 
of  reecheble  stetes  of  JIf  is  equal  to  5'(V')  A  5*(V*'),  where  5*(V')  is  the  BDD  representing 
the  reecheble  stetes  of  M*  end  5*(V'')  is  the  BDD  representing  the  reecheble  stetes  of  M". 
For  simplicity,  assume  thet  the  sets  5(V),  5'(V')  end  ^{V")  ere  independent  of  whether 
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interleKTiiig  or  non-iaterleftTiag  semmatici  are  lued.  Aa  effideat  way  to  order  the  BDD  van- 
ablea  df  the  ctanbiaed  lyitem  ia  thia  caae  u  to  have  aU  the  vaziablea  of  oae  compoaeat  (aay 
hf)  precede  aU  of  the  TariabJea  ia  the  other  oMapoaeat.  Thea  the  aomber  of  BDD  aodea 
ia  5(V)  ia  eqaal  to  the  aam  of  the  aodea  ia  5*(V')  aad  S"(V").  However,  ia  apite  of  our 
aaaamptioa  that  iatezleaTiag  aad  aoa-iatedeaviag  aemaatica  give  the  aame  readiable  atatea, 
the  aiaea  of  the  BDDa  repreaeatiag  the  iatermediate  atate  aeta  ate  potaitiaUy  differeat  for 
the  two  aemaatica. 

Let  5{(V),  5-(V^)  aad  be  the  BDDa  repreaeatiag  the  atatea  reachable  ia  i  or 

fewer  atepa  by  Af,  M*  aad  Af",  reapectively,  naiag  aoa-iaterleaviog  aemaatica.  Siaiilarly, 
let  Ti(F),  Ti{y*)  aad  Ti{V")  be  the  BDDa  repreaeatiag  the  atatea  reachable  ia  t  or  fewer 
atepa  by  My  Af'  aad  Af*,  reapectively,  uaiag  iaterleavmg  aemaatica.  la  the  coajuactive  (aon- 
iaterleaviag)  caae,  5j(V)  =  A  5f(V’*),  ao  the  aise  of  each  5,-(V)  ia  equal  to  the  aum 

of  the  aisea  of  5j(  V')  aad  juat  aa  for  the  aet  of  reachable  atatea.  For  the  diajuactive 

caae,  if  a  global  atate  ia  reachable  ia  at  moat  i  atepa  aad  the  local  atate  of  M'  ia  reachable 
ia  k  atepa,  thea  the  local  atate  of  M"  muat  be  reachable  ia  at  moat  i  —  k  atepa.  Heace, 

ri(v)  =  V  {nm  A 

h=0 

Thna,  iaterleaviag  aemaatir^  iutroducea  aa  artifidal  correlatioa  betweea  the  local  atatea 
of  A/'  aad  M"  ia  the  T{(V).  Ia  practice,  the  T{(V’)  are  geaerally  much  larger  than  the 
5j(V).  Becauae  of  thia  etfect,  atandard  breadth  firat  reachability  analyda  with  diajuactive 
partitioning  ia  leaa  effideat  than  with  conjunctive  partitioning. 

We  can  make  diajuactive  partitioning  more  effident  by  uaing  a  modified  breadth  fiirat 
aearch  (MBPS)  for  reachability  aaalyaia.  To  aearch  the  reachable  atatea  of  Af ,  firat  compute 
atatea  reachable  by  tranaitioiu  of  wirea  in  Af'.  Then  compute  the  atatea  reachable  from  that 
aet  by  tranaitioning  on  wirea  in  Af".  Thia  ia  equal  to  the  global  reachable  atate  aet,  aince 
Af'  and  Af"  are  uncoupled.  Separately  computing  local  fixed  poiata  for  the  two  parta  of  the 
ayatem  in  thia  way  removes  the  artifidal  correlation  described  above. 

In  general,  for  a  circuit  C  divided  into  loosely  coupled  subdrcuits  C7j,  we  compute  the 
reachable  states  of  C  by  repeatedly  computing  local  fixed  points  for  each  Cj  until  a  global 
fixed  point  is  reached.  This  idea  can  be  extended  to  a  hierarchy  with  any  number  of  levels. 
For  example,  consider  a  dosed  system  with  4  subdrcuits  Co  through  Co  (see  figure  7).  Let 
Vi  be  the  set  of  state  variables  of  C{,  and  let  V  be  the  union  of  the  Vi.  Subdrcuits  C,-  and  Ck 
communicate  through  the  state  variables  in  K  H  Let  Oi  be  the  set  of  variables  in  Vi 
that  are  driven  by  Ci.  The  Oi  are  pairwise  disjoint  and  their  union  is  equal  to  V.  We  can 
construct  a  hierarchy  where  the  top  level  splits  the  circuit  into  2  parts:  Co  together  with  Ci 
form  one  part,  Cj  and  Co  form  the  other.  The  second  level  of  the  hierarchy  further  splits 
the  circuit  into  the  individual  Cj.  In  thia  case  modified  breadth  first  search  proceeds  by 
alternately  finding  all  the  states  reachable  via  transitions  in  Oo  U  Ci  and  in  Oa  U  0$  until 
a  fixed  point  is  reached.  At  each  iteration,  finding  the  states  reachable  via  Oo  U  Oi  ia  done 
by  altematdy  finding  all  the  states  reachable  via  transitions  in  Oo  and  in  Oi  until  a  fixed 
pointed  is  reached. 
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Figure  7:  Ezmmple  hierarcliy  for  modified  breadth  search 


cell  1  cell  2  cell  d 


Figure  8:  Stack  circuit  block  diagram. 


8.2  An  Asynchronous  Stack 

In  this  subsection,  we  compare  conjunctiye  and  disjunctive  partitioned  transition  relations 
for  verifying  asynchronous  circuits  by  considering  an  asynchronous  lazy  stack  due  to  Mar¬ 
tin  [28].  To  determine  the  asymptotic  performance  of  the  various  methods  discussed  above, 
we  performed  a  reachability  analysis  for  stacks  with  varying  depth  d  and  word  width  to. 
This  is  sufficient  to  study  the  asymptotic  complexity  of  verification,  even  though  we  did  not 
check  for  hazards.  Hazard  checking  increases  the  verification  times  by  about  a  factor  of  two. 

Figure  8  shows  a  block  diagram  of  the  stack.  It  consists  of  an  array  of  d  cells,  each 
cell  consisting  of  a  control  part,  a  data  part  and  a  completion  tree.  The  data  part  of  each 
cell  consists  of  w  storage  elements.  A  completion  tree  consists  of  (w  —  1)  C-elements,  each 
with  2  inputs.  It  effectively  acts  as  a  to-input  C-element  and  is  used  to  signal  when  all  the 
storage  elements  in  a  ceU  have  completed  the  current  data  transfer.  The  model  that  we  used 
also  included  an  environment  for  the  stack  that  nondeterministically  performs  push  and  pop 
operations. 

The  variable  ordering  that  we  used  can  be  understood  in  relation  to  figure  8.  We  ordered 
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the  variables  by  scanning  the  figoie  from  top  to  bottom,  and,  within  each  row,  by  scanning 
from  tight  to  left.  Thns,  we  had  variables  for  the  control  part  of  cell  d  first,  the  control 
part  of  cell  d  —  1  next,  etc.  After  all  of  the  control  parts,  we  had  data  part  1  for  cells 
d  through  1,  together  with  the  completion  tree  variables  derived  from  those  data  parts.  The 
last  variables  in  the  orda  were  those  for  data  part  10  in  cell  1  (and  the  associated  completion 
tree  variables). 

We  did  a  detailed  study  of  how  verification  time  varied  with  w  for  three  different  methods: 

1.  Disjunctive  partitioning  using  modified  breadth  first  search.  We  combined  the  tran¬ 
sition  relations  for  the  gates  making  up  each  individual  control  part,  each  of  the  in¬ 
dividual  storage  elements,  and  each  completion  tree.  At  the  top  level,  the  hierarchy 
used  for  local  fixed  point  computation  consisted  of  the  environment  and  each  cell  as 
a  unit.  Each  cell  was  broken  into  the  control  part,  the  completion  tree  and  the  data 
part.  The  data  part  was  farther  subdivided  into  a  hierarchy  consisting  of  a  balanced 
binary  tree  with  flg(io)]  levels. 

2.  Conjunctive  partitioning  using  the  same  partitioning  of  the  transition  relation  as  above. 
We  used  the  following  ordering  p  of  the  parts  of  transition  relation: 

(a)  the  environment  at  the  top  of  the  stack; 

(b)  the  control  part  and  data  parts  for  cell  1,  followed  by  the  control  and  data  parts 
for  cell  2,  etc. 

(c)  the  completion  tree  for  cell  1,  followed  by  the  completion  tree  for  cell  2,  etc.  and 

(d)  the  environment  at  the  bottom  of  the  stack. 

3.  Conjunctive  partitioning  using  the  same  partitioning  as  above,  but  with  the  control 
and  data  parts  within  each  cell  combined  into  one  BDD.  The  p  used  above  is  modified 
in  the  obvious  way. 

In  all  cases,  we  used  an  initial  state  set  in  which  each  cell  could  be  full  or  empty  and  the  data 
in  each  cell  was  arbitrary.  Using  a  more  restricted  set  of  initial  states,  such  as  having  all  cells 
initially  empty,  can  increase  the  verification  time  by  as  much  as  a  factor  of  d.  Interleaving 
semantics  (method  1)  and  non-interleaving  semantics  (methods  2  and  3)  both  produced 
exactly  the  same  set  of  reachable  states  for  the  stack  circuit. 

We  also  experimented  with  disjunctive  partitioning  using  standard  breadth  first  search. 
However,  we  fotind  that  this  method  was  feasible  only  for  small  examples.  Disjunctive 
partitioning  with  modified  breadth  first  search  and  conjunctive  partitioning  were  both  much 
more  efficient. 

A  graph  of  the  search  times  versus  stack  width  for  the  three  methods  is  shown  in  figure  9. 
Search  times  using  methods  1  and  2  grew  at  about  w’*’  and  10^  *,  respectively.  Method  3 
gave  a  growth  rate  of  roughly  10^*’.  Using  this  method,  we  were  able  to  find  the  reachable 
states  of  a  32  bit  wide,  depth  2  stack  in  38  minutes  of  CPU  time  on  a  SPARCstation  1-I-. 
This  circuit  had  989  boolean  state  variables  and  over  10***  reachable  states. 

The  HDDs  in  the  transition  relation  are  aU  of  constant  or  linear  size,  except  for  those 
representing  the  completion  trees.  For  both  interleaving  and  non-interleaving  semantics,  a 
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Figure  9:  Search  times  in  seconds  for  Figure  10:  Search  times  in  seconds  for 

stacks  of  various  widths,  with  d  =  1  stacks  of  various  depths,  with  to  =  1 


nmple  saalyni  of  the  BDDi  foe  the  completioa  tteei  show  thet  for  to  equil  to  a  power  of  2, 
the  noniber  of  nodes  f{w)  must  satisfy  the  equation  (to  first  order) 

/(2io)  =  3/(to). 

When  /  is  extended  to  'valnes  of  10  that  are  not  powers  of  2,  it  is  still  a  monotonically 
increasing  fnnetion.  As  a  result,  the  above  equation  is  sufficient  to  show  that  f(v>)  is  0(10***). 
For  the  valnes  of  to  we  considered  in  our  experiments,  a  single  BDD  for  each  completion  tree 
requires  onfy  a  small  number  of  nodes.  However,  for  larger  10,  it  might  be  necessary  to  split 
the  completion  trees  into  more  than  one  BDD. 

We  also  explored  how  the  search  time  varied  with  the  depth  of  the  stack  (see  figure  10). 
The  number  of  steps  needed  to  compute  the  reachable  states  grows  quadratically  in  d.  The 
states  which  require  the  largest  number  of  steps  to  reach  are  states  in  which  internal  signals 
within  the  stack  control  are  not  stable.  Thus,  we  were  able  to  avoid  the  quadratic  search 
depth  by  replacing  the  control  part  of  each  cell  by  an  abstract  model  having  only  external 
signals.  We  separately  verified  (using  a  variant  of  Dill’s  explicit  state  verifier  [22])  that 
the  abstract  model  correctly  describes  the  eternal  behavior  of  the  control  part,  ^th  this 
abstraction,  the  number  of  steps  needed  tc  find  a  fixed  pmnt  is  linear  in  d.  The  search  times 
grow  as  for  disjunctive  partitioning  and  as  for  conjunctive  partitioning. 

Although  this  land  of  abstraction  can  greatly  improve  the  efficiency  of  verifiers  that 
explicitly  enumerate  states,  it  is  usually  not  nearly  as  helpful  when  used  with  symbolic 
verifiers.  For  example,  the  search  times  for  stacks  of  depth  one  improve  only  about  20  percent 
when  the  abstract  model  of  the  control  part  is  used.  The  ^ect  abstraction  on  the  search 
depth  is  an  exception  to  this  rule. 

The  maximum  sise  of  the  state  set  BDDs  encountered  during  the  searches  are  shown  in 
figures  11  and  12.  The  state  sets  grow  slightly  faster  than  linearly  with  width  (probably  due 
to  the  completion  trees).  They  grow  approximately  quadratically  with  depth  when  we  use 
the  abstract  model  of  the  control  part  of  each  cell. 

8.3  Distributed  Mutual  Exclusion 

As  another  example,  we  considered  the  verification  of  an  asynchronous  circuit  for  ensuring 
mutually  exclusive  access  to  a  shared  resource.  This  dreuit  is  also  due  to  Martin  [27,  23). 
The  circuit  consists  of  a  ring  of  c  cells.  Each  cell  communicates  with  a  user  of  the  resource 
and  with  its  left  and  right  neighbors  in  the  ring.  Mutual  exclusion  is  ensured  by  having  a 
single  “token”  that  is  passed  around  the  ring.  A  cell  must  have  the  token  before  granting 
access  to  its  user.  The  distributed  mutual  exclusion  circuit  is  an  example  of  an  asynchronous 
circuit  with  complex  control  and  no  data  path. 

The  variable  ordering  we  used  had  the  variables  for  each  cell  grouped  together.  The  first 
variables  in  the  order  were  those  for  cell  1,  and  the  last  were  those  for  cell  c. 

We  studied  how  the  complexity  of  reachability  analysis  varied  with  c  for  a  variety  of  cell 
models  and  search  techniques: 

1.  Disjunctive  partitioning  using  modified  breadth  first  search.  We  combined  the  transi¬ 
tion  relations  for  the  gates  making  up  each  individual  cell,  so  the  number  of  elements 
in  the  partitioning  was  equal  to  the  number  of  cells.  The  hierarchy  used  for  local  fixed 


40 


WWk 


Diftt 


Figure  11:  State  set  BDD  sizes  for  Figure  12:  State  set  BDD  sizes  for 
stacks  of  various  widths,  with  d  =  1  stacks  of  various  depths,  with  to  =  1 

point  computation  was  obtained  by  repeatedly  splitting  the  set  of  cells  in  half.  The 
cells  that  were  connected  in  the  circuit  were  grouped  together  in  the  hierarchy. 

2.  Conjunctive  partitioning  with  the  first  c—l  cells  combined  and  the  last  cell  as  a  separate 
part  of  the  transition  relation.  We  left  the  last  cell  separate  because  it  introduces 
constraints  between  some  of  the  variables  at  the  top  and  bottom  of  the  BDD  variable 
ordering.  The  last  cell  was  processed  first,  followed  by  the  combined  group  of  c— 1  cells. 

3.  Disjunctive  partitioning  as  in  item  1,  but  using  an  abstract  model  of  the  cell. 

4.  Conjunctive  partitioning  as  in  item  2,  but  using  an  abstract  model  of  the  cell. 

In  all  cases,  we  used  an  initial  state  set  of  c  states,  each  with  the  token  in  a  different  cell. 
Interleaving  semantics  (methods  1  and  3)  and  non-interleaving  semantics  (methods  2  and  4) 
both  produced  exactly  the  same  set  of  reachable  states  for  the  distributed  mutual  exclusion 
circuit. 

A  graph  of  the  search  times  versus  number  of  cells  for  the  various  methods  is  shown 
in  figure  13.  Disjunctive  partitioning  with  modified  breadth  first  search  and  conjunctive 
partitioning  were  again  roughly  comparable,  with  the  former  having  a  lower  asymptotic 
complexity.  This  contrasts  with  the  stack  example,  where  the  combined  conjunctive  parti¬ 
tioning  was  faster.  This  difference  is  probably  because  the  complexity  of  the  stack  is  in  its 
data  path,  while  the  complexity  of  the  mutual  exclusion  circuit  is  due  to  control  rather  than 
data.  The  verification  times  for  the  four  methods  grow  as  c  to  the  power  of  2.1,  3.1,  2.3 
and  3.5,  respectively.  The  largest  unabstracted  circuit  that  we  examined  had  16  cells,  256 
boolean  state  variables,  and  over  10^*  reachable  states.  It  took  slightly  less  than  30  minutes 
of  CPU  time  on  a  SPARCstation  l-(-  to  find  the  reachable  states. 
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Figtue  13:  DME  drcuit  verilicatioii  times 


Figure  14:  State  set  BDD  sises  for  DME  drcnit 

THe  total  number  of  BDD  nodes  needed  to  represent  the  transition  relation  grew  linearly 
in  c  in  aU  cases.  The  maximum  state  set  BDD  sices  are  shown  in  figure  14.  In  the  case  of 
the  conjunctive  methods,  these  BDDs  grow  approximately  cubicly,  while  with  disjunctive 
partitioning  and  MBPS,  the  growth  rate  is  reduced  to  linear. 

9  Discussion 

We  have  described  a  BDD-based  algorithm  for  CTL  model  dieddng  with  foirawM  constraints. 
The  use  of  modified  breadth  first  search  for  mdiability  analysis  has  also  been  described, 
as  weU  as  the  advantages  of  viewing  teachability  analjrtis  as  a  method  for  constructing  and 
checking  invariants.  AU  of  these  methods  are  made  significantly  more  efficient  by  the  use  of 
partitioned  transition  relations.  We  have  empiricaUy  studied  the  asymptotic  complexity  of 
verifying  both  synchronous  and  asynchronous  drcuits.  In  aU  cases,  the  verification  time  for 
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these  dicuits  grew  as  a  small  polTaoinial  in  the  number  of  drcnit  components. 

Two  of  the  features  of  our  rexification  methods  are  the  use  of  transition 

relations  and  the  amount  of  guidance  the  user  prorides  to  the  uetifier.  These  features  are 
discussed  in  more  detail  below. 

9.1  THnntion  Relations 

Our  verification  methods  use  relations  to  describe  how  circuits  can  transition  from  one  state 
to  another.  We  considered  both  monolithic  transition  relations  (which  are  represented  by  a 
single  BDD)  and  partitioned  transition  relations  (more  than  one  BDD).  For  deterministic 
systems,  this  information  can  be  represented  using  transition  function  vectors  instead.  In 
this  method,  a  separate  BDD  is  used  for  each  Boolean  state  variable  of  the  system.  This 
BDD  represents  the  function  computed  by  the  combinational  logic  driving  the  associated 
latch.  Coudert  et  aL  [18,  20]  describe  a  number  of  algorithms  for  manipulating  transition 
functions. 

Of  these  three  methods  of  representing  transitions  (transition  functions  and  monolithic 
and  partitioned  trsinsition  relations),  we  believe  a  partitioned  trimsition  relation  usually 
pves  the  best  performsmce.  A  monolithic  tranntion  relation  can  require  many  more  BDD 
nodes  than  a  corresponding  transition  function  vector  [20]  or  partitioned  transition  relation. 
However,  when  a  monolithic  transition  relation  is  not  too  large  to  store  in  semiconductor 
memory,  computations  with  the  transition  relation  appear  to  be  faster  than  those  using 
transition  functions.  This  observation  was  also  made  by  Coudert  et  aL  [20]  when  they 
compared  transition  relations  to  their  techniques  based  on  transition  functions.  In  addition, 
while  we  have  demonstrated  polynomial  growth  in  verification  time  for  several  classes  of 
circuits  using  transition  relations,  no  state  exploration  method  based  on  transition  functions 
has  shown  such  results.  Our  empirical  results  indicate  that  partitioned  transition  relations 
give  both  the  speed  of  transition  relations  and  the  memory  efficiency  of  transition  functions. 

Touati  et  aL  [34]  independently  proposed  another  method  for  representing  transition 
relations  as  implicit  conjunctions.  They  use  the  constrain  operator  of  Coudert  et  aL  [18]  to 
eliminate  the  state  set  5(V)  in  equation  2.  Then  they  compute  the  resulting  conjimction 
as  a  balanced  binary  tree,  quantifying  out  each  variable  in  V  when  all  the  BDDs  depending 
on  that  variable  have  been  combined.  We  believe  that  this  method  is  inferior  to  the  one 
proposed  here  because  the  constrain  operator  may  introduce  dependencies  on  any  of  the 
variables  in  5(V).  Generally,  5(V)  depends  on  all  or  nearly  all  of  the  variables  in  V.  Thus, 
after  applying  the  constrain  operator,  all  of  the  partitions  of  the  transition  relations  may 
depend  on  most  of  the  variables  in  V.  As  a  result,  it  may  not  be  possible  to  quantify  out 
many  variables  before  performing  the  final  conjunction,  greatly  reducing  the  effectiveness 
of  early  quantification.  Touati  et  aL  also  suggest  having  one  transition  relation  per  state 
variable.  As  we  have  described,  it  is  often  better  to  combine  parts  of  the  transition  relations. 
This  idea  is  also  applicable  to  their  method. 

We  implemented  their  method  and  tested  it  on  some  of  the  examples  in  section  7.  For  a 
pipeline  with  four  8  bit  registers  and  one  pipe  rej^er,  our  method  was  more  than  five  times 
faster.  In  addition,  for  some  of  the  relational  product  computations,  the  intermediate  BDDs 
using  their  method  were  more  than  an  order  of  magnitude  larger  than  the  final  result. 
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The  two  methodi  hawe  been  applied  to  the  KEY  benchmark  and  the  MINMAX  bench¬ 
mark  with  a  32  bit  wide  data  path.  We  computed  the  readiable  states  of  these  drcnits 
in  41  seconds  (CPU  time  on  a  SPARGstation  1-I-)  and  less  than  4  seconds  (CPU  time  on 
a  Son  3/60),  respectively  (see  section  7.2).  Tonati  et  oL  [34]  reported  mn  times  (on  a 
DEC  5400)  of  5706  seconds  and  444  seconds,  respectively.  Their  data  includes  the  time 
needed  for  parring  input  files,  computing  the  reachable  states  of  the  product  automata  of 
two  identical  drcnits,  and  checking  equality  of  the  outputs  of  the  two  automata.  Although 
these  times  are  difficult  to  compare  directly,  a  speed  up  of  two  orders  of  magnitude  suggests 
that  our  method  performs  better  on  these  two  benchmarks.  Ehnpirical  results  on  additional 
benchmarks  are  required,  however,  before  a  definitive  condusion  can  be  reached. 

0.2  Degree  of  Automation 

State  exploration  based  verification  methods  tend  to  be  more  automatic  then  methods  based 
on  theorem  provers.  This  is  particularly  true  when  attempting  to  verify  a  circuit  that  is  not 
correct.  State  exploration  methods  can  easily  produce  a  counter-example  trace  that  helps 
the  user  find  errors  in  the  circuit.  VTiih.  a  theorem  prover,  the  user  only  knows  that  the 
attempted  proof  will  not  go  through,  without  knowing  whether  this  is  because  of  a  circuit 
error  or  a  weakness  in  the  theorem  prover. 

Although  symbolic  state  exploration  methods  are  much  more  automatic  than  using  a 
theorem  prover,  it  is  still  necessary  for  the  user  to  do  more  than  just  provide  a  specification 
and  a  circuit  description.  In  this  section,  we  consider  some  of  the  dedsions  that  the  user 
must  make. 

First,  the  user  must  choose  one  of  the  many  techniques  in  the  literature,  such  as  forward 
and  reverse  reachability  analysis,  CTL  model  checking,  etc.  This  is  a  difficult  dedsion; 
determining  general  rules  about  which  methods  perform  well  on  what  kinds  of  circuits  is 
still  an  open  research  question.  There  may  be  no  alternative  other  than  to  try  several 
methods.  It  is  useful  to  start  with  a  small  model  of  the  circuit  to  be  verified,  dther  by 
abstracting  out  much  of  the  functionality  of  the  circuit,  or  as  we  have  done  in  this  paper, 
by  parameterizing  the  data  path  width,  number  of  renters,  etc.  If  a  particular  method 
performs  better  on  a  smaller  version  of  the  drcuit,  it  is  likely  to  perform  better  on  the  full 
circuit,  as  well. 

Example  verification  attempts  on  particular  circuits,  such  as  those  in  this  paper,  are 
also  helpful.  We  described  three  different  methods:  CTL  model  checking,  and  forward  and 
reverse  reachability  analysis.  Only  model  checking  worked  well  on  the  synchronous  pipelined 
ALU  circuit  that  we  considered.  With  forward  reachability  analysis  the  BDD  needed  to 
represent  the  set  of  states  reachable  in  one  step  was  exponential  in  the  size  of  the  circuit. 
However,  when  we  checked  for  hazards  in  two  asynchronous  circuits,  forward  reachability 
analysis  was  quite  efficient.  In  this  case,  the  set  of  states  with  hazards  was  represented  by 
an  implicit  disjunction  of  HDDs,  one  BDD  for  each  component  of  the  circuit.  If  we  had 
used  reverse  reachability  analysis  or  CTL  model  checking,  then  we  would  have  had  to  do  a 
separate  analysis  for  each  of  the  disjuncts.  Also,  the  BDDs  for  reverse  reachability  analysis 
of  asynchronous  circuits  tend  to  be  much  larger  than  for  forward  analysis;  the  structure 
inherent  in  the  set  of  reachable  states  is  lost. 

With  all  BDD-based  verification  methods,  a  good  variable  ordering  must  be  found.  When 
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partitioned  tnaiition  relntiona  ere  nml,  the  moat  important  bctor  ia  the  aiae  of  the  BDDa 
repreaenting  the  many  atate  aeta  computed  during  verification.  In  our  ezperimenta,  we  often 
found  wmya  to  improve  a  variable  ordering  Iqr  trying  it  on  a  email  veraion  of  the  circuit. 
We  plotted  pfo/Qea  of  the  BDDa  that  were  computed.  A  profile  ia  a  graph  that  ahowa, 
for  each  variable^  the  number  of  BDD  nodea  labded  by  that  variable.  Profilea  can  provide 
information  about  how  information  flow  in  the  dreuit  reaulta  in  large  BDDa,  and  how  the 
variable  order  might  be  modified  to  make  the  BDDa  amaUer.  We  found  that  a  little  time 
pondering  variable  ordera  paid  off  with  draaticaUy  reduced  verification  timea. 

The  uaer  muat  alao  provide  a  partitioning  of  the  tranaition  relationa.  Thia  ia  not  aa 
critical  aa  the  variable  ordering;  reaaonable  reaulta  can  be  obtained  by  aimply  having  one 
partition  for  each  atate  variable  of  the  circuit.  However,  we  have  ahown  that  even  better 
reaulta  can  be  obtained  by  combining  aome  of  the  partitiona  together.  Finding  a  good  way  to 
combine  partitiona  waa  not  a  problem;  uaually  our  firat  gueaa  worked  quite  well.  We  auapect 
the  proceaa  could  be  eaaily  automated,  given  information  about  the  hierarchical  atructure  of 
the  circuit. 

It  appeara  more  difilcult  to  chooae  automatically  the  order  in  which  partitiona  are  uaed 
when  computing  relational  producta.  Nonetheieaa,  in  practice  it  waa  not  difficult  to  chooae 
an  ordering  by  hand.  The  orderinga  uaed  in  our  expmimenta  were  baaed  on  the  natural 
flow  of  information  in  the  dreuita.  Again,  we  found  it  helpful  to  teat  our  choicer  on  a  amall 
veraion  of  the  circuit  being  verified. 

If  modified  breadth  firat  aearch  ia  uaed,  then  partitioning  the  tranaition  relation  ia  made 
alightly  more  complicated  by  the  incluaion  of  hierarchical  information  to  guide  the  aearch. 
Aaaume  the  uaer  haa  already  choaen  a  partitioning  for  atandard  breadth  firat  aearch.  Then, 
given  an  explicit  repreaentation  of  the  hierarchical  atructure  of  the  circuit,  it  ia  atraight- 
forward  to  automate  the  proceaa  of  finding  a  hierarchy  to  guide  a  modified  breadth  firat 
aearch. 

Viewing  reachability  analyaia  aa  a  way  of  helping  the  uaer  conatruct  an  invariant  providea 
a  way  for  the  uaer  to  help  the  machine  produce  the  invariant  by  changing  the  aet  of  initial 
statea.  For  the  aaynchronoua  dreuita  we  conaidered,  it  ia  eaay  to  manually  produce  an 
expreaaion  for  all  of  the  reachable  quieacent  atatea  of  the  circuit.  We  auapect  thia  ia  true  in 
general.  The  verifier  performed  quite  well  on  our  exaunple  dreuita  when  the  aet  of  quieacent 
statea  waa  uaed  aa  a  atarting  point  for  conatructing  the  invariant. 

In  our  experience,  all  of  the  dedaiona  the  uaer  must  make  to  uae  our  verification  meth¬ 
ods  are  straightforward  givm  an  understanding  of  the  circuit’s  operation  and  of  the  basic 
properties  of  BDDa.  Some  of  these  decisions  could  be  easily  automated;  other  areas  appear 
better  left  to  the  user,  at  least  given  the  current  state  of  the  art.  Although  providing  these 
hints  to  the  verifier  requires  some  extra  effort  on  the  part  of  the  user,  it  is  often  justified  by 
the  significant  improvement  in  performance  that  can  result. 

What  ia  the  best  balance  between  automation  and  manual  effort  in  a  BDD-based  verifi¬ 
cation  method?  The  answer  to  thia  question  depends  on  the  situation  in  which  the  method 
ia  to  be  uaed.  If  the  goal  is  to  produce  a  verification  tool  that  can  be  used  with  a  miniirmni 
of  training  and  without  expert  assistance,  then  foil  automation  is  of  paramount  importance. 
The  power  of  the  methods  described  here  could  not  be  used  folly  in  such  a  verification  tool. 
Other  more  automatic  methods  [1,  24,  34]  might  be  more  appropriate  in  this  situation. 
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Although  fttUy  automutic  Terificatkm  methods  haue  bccmme  much  more  powerful  in  the 
last  several  jeacs,  there  are  still  severe  restrictkms  on  the  sise  of  the  dreuits  to  which  thej 
can  be  ^pJied.  Our  empixical  results  suggest  that  a  small  amount  of  manual  assistance  can 
greatly  improve  the  scalability  of  BDD>based  verification  techniques.  Improving  scalability 
requires  more  than  just  a  constant  factor  speed  up;  it  requires  a  drastic  reduction  in  the 
rate  that  verification  time  increases  as  a  function  of  increasing  circuit  sise  (for  example, 
exponential  growth  reduced  to  quadratic  growth).  Such  a  reduction  in  growth  rate  can  only 
be  demonstrated  by  asymptotic  analysis,  such  as  the  kind  of  empirical  analysis  used  in  this 
paper. 

If  further  research  confirms  that  manual  assistance  can  improve  scalability,  then  we  see 
two  ways  that  development  of  manually  assisted  verification  methods  can  have  direct  prac¬ 
tical  value.  The  first,  quite  naturally,  is  applying  these  methods  to  verification  problems 
that  are  beyond  the  state  of  the  art  for  folly  automatic  verification  tools.  Manual  assistance 
would  still  be  potentially  costly,  in  terms  of  time  and  necessary  expertise,  but  formal  verifi¬ 
cation  would  not  be  possible  otherwise  in  this  situation.  The  second  use  would  be  to  shed 
light  on  potential  improvements  to  existing  folly  automatic  techniques.  We  view  the  current 
paper  as  an  example  of  this.  We  have  described  the  kind  of  manual  assistance  required  in 
our  methods;  if  these  parts  of  the  verification  process  could  be  efficiently  automated,  the 
result  would  be  a  more  powerful  folly  automatic  verification  technique. 
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